tCCUMENT PESUrtf 



EV 050 176 



TM OOC bbO 



;^U i rUr. 

- I' ^ I J tj 1 j U i’j 



r i i 0 PI 
F L- F I> F i I 
I's L' 'i E 



i'l c L a u ^ h i II j , r n noth 1 • 

1 U t e I I ( t r.i 1 1 O ! I O 1 If i- t h OSU its n 
I f, I a r t r, t I, t o 1 }u d J 1 1 , ? d u c d t i o L , aid < i 1 a i ^ 

i'Mi i. ii 1 n c II , i.u C . Ciiii:^ cl Corn U lI > 10110 r ot 

: 'lUCdT ion • 
i a i i ~ / ; 0 : - ^ 0 .■ 

t o 1 . 



; i ■ I S i r X L r I' II ;■ S r I 1 C f II r — 4 ' 0 ^5 1 C * !;■ . J • a 

; ■ r C h 1 i : 0 :’J Ac ii 1 1 ' y t [, (/ ht o. 50 , a { ^. 1 1 j a f 7 »as t o , "C cuh s 0 i 1 a o , 

i X j . t- c t a n c y j t x ‘ a , c r c ij [ I f .s t £ , c u i 1 o s ^ 1 ii d 1 v 1 1 Ui a I 

\ert^ , 7 - LI* i 1 1 ‘ji i.ci^ i ■ rr s , Jteic, F n a J y .s is, u J t i f W,- 

C Ij c 1 c 0 I f a 1 :. , [ C c j 1 1 :o 1 1 ri g , =!' S t a i. d a i a i y, < n 

Io;iti;, loa.st r uct icri , lasting IicgraftSi 

I h t r 1 pi ^ t a t 1 0 1 ; , lest ?e li,a iii i ^ i y , Ktsuito 



t\ L C I £ a L i 



I r* 1 a r: u J 1 o 1 1 r. a 1. 1 e n p t s t c o x | 1 a m t n o us e cj a a 
li ii It at ions ct Lcgulr:iLj.y ti^c piece ss ri soi* c ting f- d uca t loaai die as 
wi.ic/. snouiu itctivo a-loitioridl latei'^rot aciigaatily their aioaninq to 
puisnes arid eturjento* A ecu ^ am on publican on i s tand in g lestii.g 

ruipcst^s aiii Int^ I plot it ioti^- tor Pupil Tr ve lopme n t , " also [leiarcd ly 
Fir.;-*, Was issued 1 1 . 1 'J i d • A goiie lal diicussioi, ol * nc aevtlc[tnort cl a 
St ana ai d j y t u test is telle wed ly cun si d e r a 1 1 > ri or s^ecitie ty|es cl 
te i t.'- , includiijg i ii t el i i go nee or scholastic; aptitude te;ts and 
ac hi e VC nie-Jit tests:. 5cciii.g a iiiult ipie-choio ty^c xsci , the accuracy 
01 test results, and tno analysis ot oiass ac t lo ve fic. ri t aix- alsc 
discussed, A section cn ciassrcou i n 1 1 1 p r • ‘ ta 1 1 o n 01 test scorn 
Itoviacs aci^lui suggestions on how tc hanalc tue 1 n t ei i ret at icri ct 
this nate iial w 1 1 ii studer. ts ann paieiits. An extc-issxve: list ot 
s elected iijierohces is ii.ciudGd. (TA) 




E0050176 



OE -25038 

us DEPARTMENT OF HEALTH, [iujletin 1964. Nf>. 7 

EDUCATION Sc WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO 

DuCED Exactly as received from 

THE PERSON OR ORGANIZATION OHlG 
iNATiNG (T POINTS OF VIEW OH OPiN 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFiClU OFFICE OF EOU 
CATION POSITION OR POLICY 



Interpretation 
Test Results 



liy Kenneth E’. McLaughlin 
Specialist, Appraisal of the Individual 



U.S. DEP.ARTMENT OE 

health, education, and welfare 

Anthony J. Celebrezze, Sco'ctoi'V 
Office of Education, Francis Keppel, Commirsioncr 




1 



Printed 1964 
Reprinted 1965 



Superintendent of Documents Catalog No. FS6:2 25:250.38 



o 

ERIC 



V.S. OOVXRNMENT PRIMlNO OFFICE 

WASHLNCrON: 1965 



For itl« by the Stjpcrinlendenl of IkKUTrenlii, 11,8. Coterrment Printing Offic 
WiLihInKton. D.C, 90402 Prke 10 c«nli 



2 



Foreword 

Under title V, Guidance, Counseling, and Testing of 
the Madonal Defense fJdtfcaiion of 105S, the Congress 
of the United States has recognized the value of teStS as a 
tool which may be used to help make an early determina- 
tion of the aptitudes and abilities of the students in our 
schools. This bulletin attempts to explain the use and 
limitation of regularly administered tests, so as to enable 
administrators, counselors and teachers to interpret better 
their meaning to parents and students. 

Participation of a student in a testing program and the 
recording of the test scores on his cumulative record are 
not sufficient. Each counselor or teacher who works with a 
student, as well as the student himself, should know the 
student's strong and weak points — the strong points in 
order to develop them further, the 'veak ones in order to 
recognize limitations and to determine where extra effort 
must be applied. Test results, properly interpreted, can be 
of great assistance to all concerned with the instr uction 
of youth. 

A companion publication. Understanding Testing, Pur- 
poses and Interpretations for Pupil Devehpment, was i.s- 
sued in I960 (OE-26003). Both of these publications have 
been prepared by the Guidance and Counseling Programs 
Branch. 

Arthur L. Harris 
Associate Commissioner 
Bureau of Educational 
Assistance Programs 




3 



III 



Contents 



Foreword 

I. Introduction 

II. Development of a Standardized Test 

Tests as Samples 

Construction of a Standardized Test 
Use of Test Results 



o 

ERIC 



III. Intelligence, Mental Ability, or Schola.stic 
Aptitude Tests 

IV. Achievement Tests 

V. Scoring a MultipleT'hoice Test 

VI. Accuracy of Test Kcsults 

VII. Analysis of Cla.ss Achievement 

Scattergrram Anal>>is 

Prediction Tabhs 

Error Analysis: 

Made Outside of the Claspioom 
Made Inside the Classrocim 
item Analysis Methods 
"Hi^rhl.ow” 

"Alternate Response’* 

By Test Scoring: Machine 
By Typewriter 

VIII. Clas.<;ioom Interpretation' of Test Scores 

IX. Inierpretatioi) of Test Results to Parent. s 

Individual Conferences 

Group Conferences 

X. SoJoclod Roference.s 



/’rij;.' 

ni 

1 



5 

7 



9 

IS 

15 

17 

21 

21 

27 

31 

34 

34 

35 
38 
30 
43 

47 

53 

53 

51 

50 



4 



Contents 

Figures 






1- Method for Plotting Pairs of Percentiles on a Scatter- 

irram 23 

2. Method for PJotting Pairs of Percentiles on a Scatter- 

gram for a Single Class 25 

3. Expectancy Table for Predicting the Probability that 

a Student with a Certain Grade-Point Average in 
the Ninth Grade Will Obtain a Certain Grade-Point 
A\‘eragc in the Tenth Grade . . 29 

4. Table Showing Errors I\Iade on a 4est of Arithmetic 

Concejds 32 

5. Examples of High-Low Item Analysis [Nr“3G] 37 

U. Sample Item Analysis Sheet 40 

7. Sample of an Item Analysis by Typewriter With a 

Detailed Analysis of Item 15 fN“36] 45 

8. Profile Sheet for Sam Smith of Patnnka High School, 

Grade 10 50 




5 



VI 



I, Introduction 

luENTIKtC'ATIOX OF HUMAN ABIFITIFS or aptitudes is not 
easy. Tests can be helpful, but their rcsultij alone will not provide 
the information needed to solve all problcn^s or to answer all ques- 
tions concerning abilities or futuie areas of occupational success. 
Test scores can give suggestions as to a level of ability which may 
suggest areas of success, but it must 1)0 understood that even the 
best predictor can only be considered as indicat itig- “a likelihood 
of success*’ or “the odds in favor of success.” For example, one 
might say that a youth with a certain scoie on a specified test 
has three chances to one that he may sw/ccr nl as an engineer. How- 
ever, this means also that there is still one chance in four that he 
will fail in this i>articular area. 

For many years teachers have designed a small number of tests 
and administered them to their students iji order to evaluate day- 
by-day learning. Occasionally, sonic of these tests have been given 
orally to only one student individually; some have been adminis- 
tered to groups of students in only one or two classes. Today, 
however, a standardized test * or a battery of tests is frec|uently 
administered to all of the students in a large number of classes in 
several grades in a single school, or in all gradc.s in every school 
within a .school system, or within a politicjd subdivision such as a 
county or a State. To make the best use of the minutes or hours 
scheduled for such standardized testing, there must be careful 
preplanning by the teacher and cooperation by the student. 

Periodic testing periods ])orinit the student to evaluate his ac- 
complishments, to determine possible weak points in one or more 
areas, and to compare himself with the average for other students 
of a similar grade or age. Availability of test scores soon after 
the administration of a test — 



' A tc i< A nj* A^'it ini? irt'-l i unit nt <1t 'iiznr,1 f. r a ^ucriru' r*uii*ii‘.r, It liR't 

cArtfulh run'ti iirtnii with tV, c«im| ( i n* i .n if n*^tcr t»-u>rM. ^ il'jrt t-matti-r » |ncisfi>t <, 
Anu iK'-t It mu*-t I o uruUr Uff'f c* rnlitiiir* Bruf '■rmed in u 

i rcdtit I mined ni.-innrr. It m^i't !>c in*«ij itt* d in trim* tlu* « ni rmn 
^la^ r Ni n dt * t f-.i rd fi<r n ^cr it'«- d \ ijM inn i f a MTcifird f>r nfurAri' 'Tial lt\ rf. 



1 



B 



2 IMERPKKTATION OF TEST RESULTS 

1. Eiuourai;t‘,s the L'nr/n‘r to examine cloisely the current learning 

level of the student so that the course of study may be adapted 
to individual needs. 

2. Frrmits the pyinripal to ascertain the average class level of the 

students in his school so that he can determine if previa 
ously established goals are being attained. 

3. Helps the to obtain objective information which 

may be used as a basis for research in curriculum development. 

4. Suggests to school boartf Members whether the curriculum is meet- 

ing the needs of its students. 

5. Provides objective data which can inform the coNimaju7y of the 

accomplishments of its students as compared with nationwide 
averages of students in sin^ilar grades throughout the country. 

It i.s .sometimes said th.at a teacher after Iiavini? a student in 
class for several months can tell as much about his academic 
ability as can be learned from the results cf tests given at the be- 
ginning of the fall term. However, it .should not be neces.sary for 
a teacher to wait a nun bar of weeks before acquiring the in- 
formation nece.s.^ary to continue instruction at a child's current 
learning level. Ear'/ objective test re.snlts are subject to im- 
mediate verification by comparison with actual clas.srooni accom- 
plishment. 

Relatively few teachers can recognize all the able individuals in 
their clas.ses. Russell and Cronbach - referred to a study in which 
a p.sychologist asked each of 6,000 teacher.s to name the ‘'most 
intelligenr' child in his class. It was found that, on the basis of 
other evidence available about the child, only 15 percent of the 
teachers made a correct choice. Of course, they may have 
recognized other students with high potential, but they neverthe- 
le.s.s failed to identify many of the best students. 

DifTorent teachers of the same subject have difTereiit grading 
.standard.s. It would be difficult to compare one student with 
another from class to class or from year to year without using 
some common mea.suring instrument. A well-constructed stand- 
ardized te.st, properly administered, can reveal in a minimum 
amount of time a great deal about a student’s aptitudes, current 
achievement level, or interests. 

Results of tests given to all students in a grad within a school 
have proved useful, along with other information from the cumu- 
lative record, as a means for grouping students with similar abiK- 

* RumcII, Rc<fer W. trd Cronbtth, Le« i. Report of Tenimcnjr tl t Congrei'lontl Hrtring 
fto tbe CornmUiet on Labor tnd Public Welfar# on Feb. 27, The awrri'ran 

rotfiofopief IS; 2l»-2?0, March im. 



O 

ERIC 



7 



INTERPRETATION OF TEST RESULTS 3 

ties. Many school administrators believe that a teacher should be 
al)le to accomplisli more with his students if most of them have 
about the same level of ability. W'^ilh students at a similar level, 
the teacher docs not have to restrain the bright ones while drilling 
the slow, or h se contact with the slow while trying to interest and 
anticipate the sharper questions of the bright. 

It is not to be inferred, however, that all of the same students 
should be kept together for all classes because of the results of 
one general aptitude test. In many situations, for example, it 
might be best to separate the students in classes of arithmetic, 
English, or science. Even though students may be roughly 
grouped in the assignment to a particular class section, there may 
still bo a wide variation in ability within each class. It is at this 
point that a careful analysis of test resi.lts would help the teacher 
diagnose quickly the weak and strong points of each student. 

The administration of tests is not an end in itself. Tests should 
never bo given simply for the sake of filling in the blanks on 
a student’s cumulative record card. Each test should be adminis- 
tered for a specific purpose and used to help the student de'er- 
mine his educational or vocational goals. The results of stand- 
ai'dized tests can be helpful to the student, his parents, and his 
teachers, as together they plan a worthwhile school program. 






I!. Development of a Standardized Test 

It would be difficult to design a 40- to 90-minute test 
which would cover completely sxny particular field of knowledge. 
For example, how could a single 40-minule test include all that a 
student should know about English literature? (It is not disputed 
that in a few cases the test icovld provide a 100 percent sample of 
the student’s knowledge.) Or how' could a teacher, in 90 minutes, 
test for student undcr.staiiding of all of the theorems of a plane 
geometry course? 

Tests os Samples 

Since complete test coverage of a subject Is not possible, it be- 
come.s necessary to take a sample of all possible items in a speci- 
fied cour.se or in a pai ticular subject-matter area. This can be 
done fairly well by a classroom teacher if he follows certain pro- 
cedure.s during .several succeeding semesters. However, a test 
publisher has already completed .such procedures when he has 
constructed a standardized test designed to measure achievement 
in a specified area. Further, the test publisher has spent many 
months and thoinsauds of dollars in completing Ihe processes 
necessary to make available a test which meets the consumer's re- 
quirements for reliability, validity, and norms. 

Construction of a Standardized Test 

Olio of the advantages of a standardized lest ^ is that a profes- 
.siona! te.stmaker constructs it according to subject specifications 
determined by a committee of experts iii a particular subject. The 
iQsi agency selects these experts from the appropriate academic 
level — elementary, secondary, or college. This committee, aftei 
examining numerous textbooks and courses of study from many 
parts of the country, determines those topics common to most of 
*he curricula for the particular subject and grade. 

The committee develops a table of specifications, or predeter- 
mined “skeleton” of topics, in outline form. Jt decides wh^t 

> KfTinrtl) r. n TrM Ruilt? (U S. Office tf Edjr»li(ir>.) 

Ttftir.ff. ar.d ffjion* for Pc r< fof me n(. US, 

mcnl riintiT)* Office. l&RZ. (OE-2AOCn p, 4 

ERIC 



9 



6 INTERPRETATION OF TEST RESULTS 

proportion of items in the total test should be assigned to each 
topic in order to give a reasonable balance, based upon the varying 
and relative impor ancc of the different subtopics. Members of 
the committee and other specialists write a large number of test 
items in the appropriate form to fit the predetermined outline. 
Objective test items may appear in any one of a number of forms, 
such as true-false, matching, or multiple-choice, P'or most sub- 
jects the item v-riters put the items in a four-choice or five-choice 
multiple form. 

The committee sorts all related or similar items and uses its 
best judgment to select the required items for each main topic or 
subtopic of its outline. During this process the committee may as- 
semble several parallel test forms. 

Next, as a pretest or tryout, the testmaker arranges to adminis- 
ter the test forms in representative schools to a sample cf students 
of the age or grade for which the test is designed. 

After scoring, committee analyzes each le.st item deter- 
mine its difficulty; that is, the percent of students who marked 
each it m correctly. The committee rejects any iterr» which all 
students mark correctly or incorrectly since it would have no 
effect on the relative ranking of each student. 

After placing the students’ papers in order from high to low, 
the committee selects a high and low group of papers, and checks 
each item for its discruninatfui; yowev; that is, the percent of 
pupils with high total scores answering the item correctly is com- 
pared with the percent of low-scoring pupils choosing the correct 
ans^^’er. If the item is a good one (i,e, discriminates) , more 
.students in the high group than in the low group should mark the 
correct response. 

Next, the committee checks to discover whether or not some of 
the students in the sample chose each of the distractors, or in- 
correct choices. If student, or a very small number, selected a 
distiT.ctor, a member of the committee writes a new one to use in 
the next tryout of the item. 

The committee selects the items which r -eet the required 
standards of difficulty and discrimination and assembles the 
needed final test forms. The testmaker then administers these 
tests to a national sample of students and establi.shes national 
norms. 

If a teacher completes an aiialysis of the items on a stand- 
ardized test, he will discover that in a small class some items may 
not discriminate, and a few items may be too easy or too difficult. 
Such informatioji is a useful indicator of the coverage of his 



o 

ERIC 



.10 



INTERPRETATION OF TEST RESULTS 7 

course as compared with other courses in the country. The real 
teaching purpose of such an analysis, however, ii, to use the test 
as a diagnostic instrument. The teacher discovers how many 
students missed an item in a particular section of a course and 
can reteach these concepts. 

No classroom teacher has the facilities to complete all of the 
steps of an item analysis before administering a test of his own 
consti action to one of his classes for the first time. However, he 
can make a table of specifications and write items to fit the pur- 
poses of the course. After the first administration of the test, he 
can also complete an item analysis on his test which will point 
out the students' errors and will help the teacher improve test 
items which he may use in future tests. Various methods for 
obtaining item analysis information will be suggested in later 
sections of this bulletin. 

Because the curriculum for each paiticular subject may vary 
from school system to school system, it is generally recommended 
^hat, before a standai'dized test is selected for use in a school, a 
conii'‘iltee of teachers in the subject-matter area examine several 
ol ihs available te^ts to determine which one most closely fits the 
local curriculum. If this is not done, test scores may not be as 
high TiS expected. If the tests administered are too difficult, some 
of the students may feel they are not progressing as they should, 
and those with the lowest scores may have nn unwarranted feeling 
of failure and lack of progress. On the other hand, if the test is 
too easy for the group, some students may receive such high 
ratings that they may become overconfident. 

The teacher and administrator must understand that the norms 
accompanying a standardized test may be based upon a population 
which differs from that of their school. Whether or not this is 
true may be determined by examining the test interpretation 
manual. 

Use of Test Results 

Teachers and parents sometimes expect a test to diagnose all 
difficulties or point cut a well-defined road that the student can 
follow until he reaches his goal. However, tlie road is more like 
one found on an ocean beach. One can see where many cars have 
driven — but the road is a wide one. Wiien driving along one can 
swing several feet to either side without difficulty and still b', 
heading in the same general direction. Similarly, test results may 
indicate a desired direction, but other available information must 
be used to help determine the path which each student may follow 




11 



O INTERPRETATION OF TEST RESULTS 

to I each his goal. A test score is one of the tools of guidance. It 
must be used in association with other information concerning the 
child’s background, environment, st/engths, and weaknesses. 



o 

ERIC 

12 



III. Intelligence, Mental Ability, or 
Scholastic Aptitude Tests 



The purpose 0?' intelligence, mental ability, or scholastic 
aptitude tests is to provide an estimate of the ability of an in- 
dividual to learn or to acquire understanding. It is sometimes said 
that an individual who is high in sit^h abilities is capable, among 
other thiiigs, of successfully coping with novel situations to which 
he may be subjected. Because it is rather difficult to design tests 
which will indicate the level of ability necessary to reason in new 
situations, such abilities must be measured indirectly by tests 
which emphasize knowledge of vocabulary, skill in the disco.^cry 
of underlying patterns, and the ability to ma nipulate both mathe- 
matical or abstract symbols. 

A group intelligence test, when administered properly, results 
in a raw score which must be converted to a mental age (MA) 
or to some other meaningful score for comparative purposes. The 
mental age corresponding to each score is determined by first giv- 
ing the test to large samples of students of the same chronological 
age. Then the average score for each age is computed and a table 
constructed so that the teacher can determine the mental age, in 
years and months, corresponding to each test score. Note, how- 
ever, that this averaging method for determining the MA scale 
immediately suggests that the “true” MA of a particular student 
might be a little higher or a little lowev than that indicated by the 
table. That is, ^he teacher should not imply that in a group of 
students of the same chronological age, a student with « computed 
mental age of 110 months actually has a higher mental aga than a 
student with a mental age of 108 or a lower meiital age than one 
with a mental age of 112. If another test were given, the mental 
age order of these two students might be reversed. In other 
words, the values obtained should be used like the 1/4-inch mark*? 
of a carpenter’s rule and not like the 1/100-inch rulings on the 
micrometer of the machinist. 

To compute the most commonly known IQ, or Intelligence 
Quotient, one forms a ratio, or quotient, of the mental age (MA) 
divided by the chronological age (CA) — this quotient being multi- 



13 



9 



lO INTERPRETATION OF TEST RESULTS 

plied by 100 in order to eliminate decimal points. The preceding 
statement may be written as follows: 

IQ=^|xl00 

Thus, if it is determined that a student has a mental age of 10 
years, or 120 months, and his chronological age is also 1^20 months, 
then the ratio of 120 divided by 120 is equal to 1. When this 1 is 
multiplied by 100, one obtains the ratio IQ of 100. Thus : 

IQ=j|§X 100=100 

Again, if a child happens to have a mental age of 132 months 
and his chronological age is 120 months, then his IQ will be 
greater than 100. In this case, it would be equal to 110. That is — 

IQ=^X100=110 

Further, a child with a mental *.ge of 96 months and a 
chronological age of 120 months would have a below average IQ 
of 80. That is — 



IQ^jf^Xl00=80 

The just described ratio IQ has several disadvantages which 
have been highlighted by recent research. The ratio IQ is Sa.^ed 
upon the idea that a child’s rate of mental development is fixed. 
This has been found to oe untrue. Technical characteristics of a 
scale related to the difficulties of the items used cause different 
variabilities to occur at different ages. Finally, it has been sug- 
gested that one should not apply the ratio IQ to persons over 
age 13.' 

The familiar individually administered Stanford-Binet IQ was 
computed by the above ratio method and had a standard devia- 
tion, or variability, of 16. (This means that if the average in- 
telligence of the whole population is considered to be 100, then 
the IQ’s of the middle two thirds of the population would lie 
within a range of values irom 16 points below 100, i.e. 84, to 16 
points above 100, i.e. 116.) The revised (1960) edition of this tost 
and some of the more recent intelligence tests have reported re- 
sults in terms of “deviation IQ’s.” Under thi method the mean 
scor^; for a particular age has been conslde ed to be an IQ of 



* Crvn^KCh. Lr* J. £'jjen(,‘ U of td •d. New YorV, K*rr«r A 

Hrcn . U60. p. J7I. 



14 



INTEKPRETATCON OF TEST RESULTS 



11 



100, and whatever MA falls at a position of one standard devia- 
tion above the mean for each age mav be converted to an IQ of 
116, if there is a desire to establish a correspondence with the 
Stanford-Binet. If all intelligence test scores we»'e converted in 
terms of deviation scores with a standard deviation of 16, there 
would be less difficulty interpreting the many IQ’s now appearing 
in transfer students’ cumulative records based upon different IQ 
tests. 

However, all of the intelligence tests developed by different 
publishers have not been equated in terms of the above standard 
score scale. Further differences in the meaning of IQ scores 
occur because the norms are based upon different samples of the 
population and give different mental ages. It is possible for a 
pupil to have an IQ of, for example, 120 according to one test and 
an IQ of 112 according to another. Another pupil might have an 
IQ of 92 on the first of these same two tests, and an IQ of 100 on 
the other. Thus, the counselor who uses these results, or interprets 
them to teachers and parents, must always know the name of each 
test used. Hn can then make his own mental correction or adjust- 
ment so that the results become more meaningful. The counselor 
should also know when each of the several IQ tests was given, so 
that he can note discrepancies or expected differences which may 
have occurred. Therefore, the complete name of each test and its 
date of administration should alw’ays be entered in each rtudent’s 
cumulative record. It would also be helpful to know whether the 
test was administered by a teacher, principal, psychometrist, or a 
school psychologist. Then, if there seems to be any discrepancies 
between the test scores, the interpreter might immediately recog- 
nize the source of unusual error. 

Because of the misunderstandings which have arisen over the 
meaning and use of the IQ, many schools are currently adminis- 
tering scholastic aptitude tests rather than IQ or intelligence tests. 
Results cannot be reported in terms of an IQ. The report of a 
scholastic aptitude test is most often in terms of a percentile rank. 
The percentile rank is the percent of scores in a national or local 
distribution of scores which is equal to or lower than the score 
corresponding to the given rank. Thus, if a student’s percentile 
rank on a test is 75, then his score Is equal to or better than 75 
percent of those scores made on the same test in either the ^ 
tional or local distribution. 

Most of the current scholastic aptitude tests include at least two 
kinds of items — verbal and quantitative, Sometimes some t the 
quantitative items might be considered as verbal items because so- 



12 



INTERPRETATION OF TEST RESULTS 



called arithmetic ‘"story” problems are often included. Naturally, 
the student must be able to read the problem in order to analyze 
it and arris e at a solution. It is entirely possible that a student 
svho ha.s a poor verbal facility and a high mathematical facility 
might receive a lower score than he deserves. However, most 
tests of this type will give a verbal and quantitative score, as well 
as a score based upon a composite of the two parts, so that the 
area of strength or weakness may be determined or further ex- 
plored. 

Sometimes the statement is made that mental ability tests given 
in the lower grades are not valid because the children^ are too 
young. However, it must be remembered that, in establishing the 
norms for these grade levels, other children of the same age.s 
took the test under similar conditions. Thus, the results serve to 
give a general idea of the capabilities of a student. 

In most school systems where a planned testing program has 
been established, it is customary for groups to take scholastic 
aptitude or intelligence tests at regular intervals of 2 or 3 years. 
In some schools, the same test or a higher level of the same test 
series is used. In other schools, it is the policy to use a different 
mental ability test at predetermined intervals. 




IV. Achievement Tests 



Scholastic attitude tests often serve as predictors of 
future achievement, while achievement tests measure the actual 
skills or subject-matter content acquired at any grade level. At 
the elementary level, achievement tests measure the attainments 
in the basic skills areas, such as reading, arithmetic computation, 
map interpretation, and spelling. At the secondary level, achieve- 
ment tests measure attainment in such areas as English, social 
science, natural science, mat hematics, and foreign languages. 

By incorporating carefully graded materials, a nunTber of the 
available achievement test batteries cover a wide range of grade 
levels, beginning at grade 3 or 4 and continuing through high 
school or the freshmen year of college. Since it is difficult to cover 
saH:^factorily such a large grade range with a single test, a series 
of te.sts has been developed in each subject, each test covering 
several grades, such as grades 4 -6, grades 6-8, and so on. When 
the grades tested are overlapped with two tests, the teacher has 
several choices. For example, if a teacher has an advanced 6th 
grade, he might give a test covering grades 6-8; if the group is 
.slow, he may choose the test covering grades 4-G. Other achieve- 
ment batteries cover eithei* the elementary or .secondary school 
grades — but not both. If a school uses parallel forms of the same 
battery at frequent intervals — that is, annually or biennially — it 
is possible to observe the growth of the student in each of the 
areas included in the battery. 

Test results frot.i a coordinated testing program are most im- 
portant to the teacher as he tries to group his students, or to 
discover the weaknesses of each student or of each class in the 
various subject-matter areas. Summary record charts are often 
available from the test publisher for recording certain combina- 
tions of schoJaslic aptitude and achievement tests. These fornis 
may be designed to show a student's academic growth Profile oi 
to show class strengths or weaknesses. Similar charts can be 
made by the school or by the teacher to fit the chosen tests. A 
study of these charts by a te?t specialist or counselor may sug- 
gesi. irregularities \vhich have occurred in the administration of 
the te.<. Fcr example, if all of the scores of an average class 

13 



■17 



14 INTERPRETATrON OF TEST RESULTS 

seem to be much higher or lower than would be expected, one 
might consider whether too much or too little time permitted 
for the tests, whether the teacher gave extra help to a class, or 
whether a teacher failed to follow an important iiislruction for a 
particular class. 

Since achievement tests can be helpful to teachers, it is impor- 
tant that such tests be selected with care. Before a final choice is 
made, the test content should be compared with the appropriate 
curriculum to determine whether the items included are covered 
in the local program and would be fair to the students. 




IS 




V. Scoring a Multiple-Choice Test 

T HERE ARE a number of ways to score a niultiple-clioice test, 
whether it be a standardized test or a teacher-made test. One of 
the earliest and most widely known methods for scoring a spe- 
cifically designed answer sheet which has been marked 

with an electrographic (current-conducting) lead pencil is by 
means of the IBM 805 test-scoring machine. A punched answer 
key, or matrix, which is inserted in the machine permits a small 
unit current to flow for each correct answer marked by the 
student. The bits of current are added to give a reading on a 
meter dial which indicates the total number of correct answers. 
If a special scoring formula is required, the machine, when 
properly set, w*il! automatically deduct a fraction of a point for 
each incorrect answer. 

New electronic scoring machines which are located in several 
test-scoring centers require that the student make an opaque 
mark in the required space on a different type of answer sheet. 
Then an optical scanner, which "reads** these "spots,** auto- 
matically records the total number of correct answers. The 
answers to as many as nine tests of a battery can be marked on 
the two sides 'x special ans\ver sheet, along with the student*.^ 
coded name. li p machine will "read" the name and will .score 
these papers at the rate of more than 6,000 papers an hour. A 
computer is used to determine the percentile or standard score 
corresponding to each raw score. 

Some test co!npanies have developed answer cards which can 
be scored by special mark-sensing machines or optical scanners. 
Such machines have the test data available immediately for 
further statistical analysis. 

A number of new scoring machines continue to appear on the 
market, Since they are designed primarily to score teacher-made 
rather than standardised tests, these machines are small and in 
some cases portable. In most cases, these machines will do only oii^ 
thing — give a total raw score. Thus, it is not pos ible for the 
teacher to complete an item analysis with them which will permit 
the use of the test results for diagnostic purposes. 

One portable test-scoring machine weighing less th'*n 25 poiiiids 

)5 



J9 



If' INTERPRETATION OF TEST RESULTS 

uses a “porta-punch’' type card hand-punched hy the student. The 
operator is reqpiired to note visually the “number right" indicated 
by a counter mounted on the front of the machine and to write 
this number on the answer card before clearing the machine to 
score the next card. 

Another type of scoring machine is the size o^ a duplicating 
machine and weighs 50 pounds. It operates automatically to score 
up to 200 ne\>"-type answer sheets for one loading of the machine. 
Special lead pencils are not required to mark the answer sheets. 
The number of wrong answers and omitted questions is printed 
automatically on the an.swer sheet and the questions which are 
missed are automatically marked on each paper. The number of 
questions missed by an entire class is recorded on ,i counter. 

Although there are a number of machine procedures for test 
scoring, one should not neglect several of the simplest procedures 
which can be used when necessary with both standardized and 
teacher-made tests — hand scoring. There are several kinds of 
hand-scoring answer keys which nay be used — such as the fan 
(or accordion) key, strip key, and cut-out key. Each of these 
keys is designed so that, if properly adjusted, the correct response 
for each question will appear near the designated answer space 
on the student’s paper. The teacher can then make an accurate 
comparison of the answer key and the student's responses. 

It is possiole to punch out a blank scoring card, or matrix, to 
nt an answer sheet, whether it is homemade oi* purchased, Jf it is 
a standardized ^est, it is often possible to take the punched key 
which i.s provided for machine-scoring purposes and iisj it for 
hand scoring by placing it over the answer sheet and counting 
the correct answer.^, (XOTE: Some tests which have many sub- 
part.s may require one set of keys for machine scoriiig and a 
ditterent set of keys for hand scoriiig.) Counting correct marks 
by 2’s, that 2, 4, 6, 8, is quicker than counting each correct 
response singly. For large scoring job.s an inclined scoring frame 
to hold the key and answer sheets will speed the process. 




20 



VI. Accuracy of Test Results 

Counselors and teachers who interpret, test scores 
must remember that a test scori* does not represent a precise point 
on a scale. One must think of the wide mark made by a stub pen- 
cil, or an even larger interval or band, as representing the region 
which one is certain includes a student's "true" test score. By a 
"true" test score one means a number which would represent 
exactly the level of ability or achievement which a test is sup- 
posed to measure. It is impossible to ever find this "true” score. 
However, one can be reasonably certain, with known probability, 
that an obtained score does not differ from the "true” score by 
nmre than a certain amount. 

The uncontrolled or chance "error” which is inherent in test 
scores is referred to as the "standard error of measurement,” 
This means that if it were possible to administer the same test to 
a student several times, without any learning occurring in be- 
tween, his test scores would vary by several points. Therefore, 
one becomes somewhat concerned as to how well a particular test 
score is an estimate of a student’s true score. This information 
should be included in tables in the publisher’s manual which ac- 
companies each test. Some publishers show, for the same test, a 
different standard error for various parts of the score dis- 
tribution. 

Consider an illustration of the interpretation of the standard 
error of measurement as it relates to the bell-shaped distribution 
called the "normal” curve. The key to understanding the mean- 
ing of the standard error of mea>i rement is to note that one de- 
ter.mines the probability that the obtained score of a student does 
not miss its true value by more than a certain specific amount. 
In ever larger intervals, one determines this probability by 
multiplying the standard error of measurement by ±1, ±2, ±5/2 
and applying values derived from a normal probability table. For 
example, suppose that a student’s true score on a test is 75 and 
the standard error of measurement is 4. According to the normal 
probability tabla, the chances in this case are approximately 2 
out of 3 that the obtained score doc^ not miss its true \n\ue by 
more than ±4 points. The obtained score of the student would 

17 



21 



18 INTERPRETATION OF TEST PT:SUrAS 

be somewhere in the range of 71 to 79; i.e., 75~lX-^=71 to 
75-f-lx4=79. The chances arc approximately 19 out of 20 (or 
approximately 95 out of 100> that the obtai-.'^ed score lies within 
the range of 67 to 83; i.e., 75—2x4=67 to 7o-f2x4=83. Finally, 
the chances are approximately 99 out of 100 that the obtained 
score lies within the range of 65 to 85; i.e., 75—5/2x4=65 to 
75 -h 5/ 2x4 =85. In most cases, however, it is sufficient to con- 
sider only the range of scores between plus and minus one 
standard error of measurement. In the usual situation, the raw 
score of the student is the best estimate of the true score. Thus, 
if the student’s raw^ score is 75, as in the above example, we 
would say that the chances are 2 out of 3 that the true score 
would lie between 71 and 79, and so on. 

In considering the standard error of measurement in terms of 
percentile ranks, one would generally have a numerically larger 
interval, or band, than that indicated by the standard error in 
terms of raw-score units. The percentile band will have the 
greatest width at the center of the score distribution, w^here there 
is the largest number of cases, and will be narrower toward the 
ends of the distribution. Continuing with the preceding example, 
suppose that a ratr score of 75 correspond? to the 60th pfrcentile, 
thm the pcrct-nfi/^ bojid for cne standard error above and below 
thj score would be approximately from 48 to 7l. For a raw score 
of 85 which corresponds to a pace utile of 84 the pcrcetitiU band 
would be from 76 to 90. 

The magnitude of the standard error of measurement must be 
computed for each test. In some tests it may be five or six raw- 
score points. In others, it may be only a point or two. Its value 
depends upon the reliability of the test, which is determined and 
presented by most publishers, a:;d the variability or standard 
deviation of the test scores. If, for the same class, two test-score 
distributions were equally variable, the standard error of measure- 
ment would be smaller for the test v.hich is more reliable. 

By using the standard error of measurement, the teacher or 
counselor examining (he test results may kimw (he range, or band, 
of possible ability or achievement suggested by each score on th? 
test. For this reason, each teacher should read the test interpreta- 
tion section of the manual which accompanies each test. Most 
test publishers have taken great care to comi ulc and cornTnunicate 
the standard error of measurement for each tost or siibtest. Other 
information which will make lest results more meaningful, such 
as prediction lable, intei correlation matrices, and sample applf^'a- 
lirns, is also often included in the manual. 




22 



INTERPRETATION OK TEST RESULTS 19 

Additional error'i in scores may be introduced in administering 
the test if the administrator does not read th° manual and follow 
its crucial instructions. Foe example, the teacher must adhere 
exactly to prescribed time limits and must continually proctor the 
students during the testing period. When a test specialist or 
counselor notices that most of the test scores of a class appear 
to be much higher or lower than expected, he should check im- 
mediately with the test administrator to determine any irirguhri- 
ties in test administration which couM affect the students’ scores. 

Another source of error :‘s inaccurate scoring. Trained clerks 
are generally more accurate than teachers, P^or tests which must 
be scored by hand, every test paper should be independently 
scored twice, preferably by a different person each time. When 
the tests are scoi ed by the IBM test-scoring machine, it is recom- 
mended that at least every tenth paper be checked a second time. 
Some scoring services perform the operation twice on different 
machines. Scoring by means of the new high-spe^i^ electronic 
machines is fantastically accurate. 

Additional errors may occur during the conversion of a raw 
score into a more meaningful score, such as a percentile or a grade- 
equivalent, Such computations be double-checked. Hand- 

transcription errors to cumulative or other records must be 
eliminated, or at least diminished, by double-checking all entri?s. 
Although high-speed electronic scoring procedures mr.y reduce 
errors of scoring and norming to a mirimum, scores entered by 
hand in the cumulative record must be checked unless individual 
score reports are available on the recently developed pressure- 
sensitive press-on labels as part of the scoring machine high*speed 
printer output. 

In summary, before teachers, counselors, nr administrators be- 
gin the interpretation of recorded test scores, they must have 
confidence that, ex^ipt for known error, no additional errors have 
been introduced into the test results because of improper ad- 
ministration, inaccurate scoring, failure to read the appropriate 
norm tables, or the incorrect transcription of scores ♦o permaneii^ 
cumulative record folders. 




23 



VII. Analysis of Class Achievement 

A TEIACnER OR COUXSELOR knows that staiidardizeu test 
scores are only a portion of the many systematically recorded bits 
of information and observations concerning the ability and 
promise of a student. These data are available in the cumulative 
record folder of each student. 

Individual test scores may be inteipreted in terms of national 
norms, or local norms which may be based upon a class, a single 
s'^hool, ov a complete school system. For maximum effeclivenoss, 
brth st'indardized and teacher-made tests should be analyz»xl as 
soon afi the scores are available. 

By the analysis of .standardized and teacher-made test results, 
a number of questions similar to those given below cun be 
answered : 

Is the student workinj? up lo the level cf his ability? 

What is the l^robability of student success for different subjects at the 
next grade level ? 

What kinds of items are missed n^oj^t frequently on standardized testy? 

What are the most common misconceptions or most frequent student 
errors in each class in the niain divisions of each curriculum? 

How can the teacher deterinine his brst test items and the form and 
content of those items which need to be changed in order to give a 
better evaluation of each student? 

There are several procedures — scattergraiu analysis, prediction 
tables, enor analysis, and item analysis — which can help the 
counselor or teacher to answer such questions. Each of these sug- 
gested procedures can be completed in a reasonable amount of 
time and may be appred to either standardized or teacher- 
made tests. 

Scollergram Analys's 

Two-way charts, or scattergram.*^, for a class are constructed to 
picture for each student his relative score position (a raw score, 
scaled score, or percentile rank or grade) with respect to any two 
of the following items: 



21 



24 



22 INTERPRETATION OF TEST RESULTS 

1. A scholastic aptitude test. 

2. An achievement test. 

?. A term or semester grade for a course in which he is currently 
enrolled. 

4. The class grade or test score of the student at a later time in his 
school career. 

A scholastic aptitude test often includes subscores of verbal 
ability and quantitative ability as tvell ap a total score. One of 
scores is often plotted on a chart along with another score 
01 grade made by a student. The pairs of test scores for a num- 
ber of stiidtnts can be plotted in tiie same chart. For example, 
one Cv.*j]d plot the verbal portion of an aptitude test together with 
the test score on an English or social science test or with a grade 
received in either of these subjects. 

Similarly, the quantitative score of the aptitude test can be 
plotted together with a test score or grades in arithmetic, algebra, 
geometry, or one of the sciences. Each teacher would ordinarily 
plot only those scattergram^ related to his own teaching field or 
to a subject with which many students in his homeroom class may 
be having difficulty. The counselor, on the other hand, can use the 
appropriate charts from the different fields when he talks with 
teachers, students, or parents. 

The method for constructing a scattergram can be illustrated 
briefly. On large-squared graph paper, draw a vertical line near 
the left side of the paper superimposed on one of the printed 
rulings; draw a horizontal line near the bottom of the sheet 
joining the vertical line. The^e two lines, intersecting at right 
angles at the lower left-hand corner, are called the axes. 

The scores for one type of test (eg., an aptitude rest) can 
be plotted with respect to one axis while scores for a second type 
of test with which the first is being compared (e.g., an achieve- 
ment test) can be plotted v.ith respect to the other. While it 
does not matter with respect to which axis either of the two sets 
of scores is plotted, one type of score should be consistently placed 
along cither the horizontal or vertical axis only. If the deci- 
sion ij to plot scholastic aptitude scores along th^ horizontal 
axis— whether verbal, quantitative, or total — then subject-matter 
achievement tost scores would be represented along the verti- 
cal axis. 

As an example, assume that percentiles are available for each 
student on the verbal portion of a scholastic aptitude test and an 

' Thf renrrtY would b« folk, fd If thf v*r# In r«w Korr?, 

J piar dard pfc>rp!i. c»r rr«d«<>. 



ERIC 



25 



INTERPKETATIO'r^ OF TEST KES’JLTS 



23 



achieverTient test in English^ Beginning at the point of intersec- 
tion of the two axes, mark off ^he decile (tens) points along each 
axis. Percentiles on the horizontal axis then proceed from the 
lowest on the left to the highest on the right, while percentiles on 
the vertical scale will rise from the lowest to the highest.- Heavy 
rulings mark the 50th percentile on each ax’s. 




Percenti 1 2 

SCHOLASTIC ABILITY TfST - VERBAL 



Figure T. Method for PtottmQ Po^'ri of Pcr<cnti1ci on a Scotfer^rom 

There are several ways to plot the pairs of scores. One method 
of tabulating the numerous pairs of scores of the students in a 
large school, or several classes together, is to make a tally mark 
( 0 in the appropriate square for each pair of scores. Three 
pairs of percentiles are plotted in figure 1. Since student 1 has a 

* If »rf to recorded insleiil of i'prc«ntiTrs, the jrr»d#. an f »t a nufnb«r 

aiafl^. ahould b« r^aceO at the ir>tf’'>ff "ion of Ihf atrii or r^ro jK>«itKiT>. t^ltfr »rad« on 
the h'>r<ti,Tilal must placed in the order f' D C h ao thi t the inter j retationi 

may apply which ar>' MiKrr«ted f^,r acatterarams prefenled in term* of percentile^. 



24 INTERPRETATION OF TEST RESULTS 

percentile of 41 on the verbal aptitude test and 65 on the English 
test, a tally (/) is made in the square which lies at the intersec* 
tion of the vertical column between 4Q and 50 and the horizontal 
column between 60 and 70. The ploti;€d percentiles of student 2 
are 75 on the verba! test and 95 on the English test. The plotted 
percentiles of student 3 are 10 and 20. (Note: The arbitrary 
rules are applied by which ore plots a percentile which falls on a 
rertiVa/ line in the column to the right and a score which falls on 
a horizontal line in the row oborr the line, i.e., the scores 0-9, 
10-19, etc., are plotted in the same column.) Such a scattergram 
presents a good picture of the relationships between the two tests 
for a large group of .studeiils. 

For a smaller group, such as a single classroom, another method 
may be used (figure 2). If all of the students m a class are listed 
alphabetically and assigned a number, then it is possible to 
identify each plotted position on the chart by placing the ap- 
propriate number beside it. In some classes indicating the sex 
of the student in each position may be of interest. This can be 
done in any one of several wa 3 's: 

1. Make a square for a male and a circle for a female. 

2. Assign odd numbers to the males Lwd even numbers to ii t femrles. 

3. Record the numbers in color code — blues for males ai.d red for 

females. 

Method 1 (used in figure 2) permits the addition of another 
piece of useful information to the scattergram, namely, the 
teacher's grade at the last marking period or at the end of the 
.semester. As shown in figure 2, one can have the following visible 
information on the chart for each position: a number coded to 
the student, the sex of the student, a percentile on each of two 
tests, and a teacher’s grade. 

A diagonal line has been drawn in figure 2 from the lower 
left-hard corner to the upper right-hand corner indicating points 
where a student's verbal scholastic aptitude and English achieve- 
ment percentiles are appro imately the same. That is, a student 
whose plotted scores lie on this line would have such percentile 
scores as 25 on the verbal and 25 on the achievement test in 
English or 72 on the verbal and 72 on English, and so on. If each 
pair of scores were the same for each student, a statistician 
would say that there is a perfect positive relationship or a correla- 
tion of 1.00, This rarely occurs in practice. 

In figure 2 a band is formed by drawing a dotted line on each 
side of the diagonal and parallel to it. Although the exact loca- 




27 



INTERPRETATION OF TEST RESULTS 



25 




n t^Q\e (3 mole 



Note: Letter tnsid e symbol «s groc/e for couf'^ , 
Number outside syrr.bol Identifies student, 

figure 2. Method lor Plotting Pain of Pcrcentitci 
on 0 Scoltergrom for o Single Cicis 



tion of these lines would vary somewhat with the tests or other 
measures used, the band plotted here suggests that considera- 
tion must be given to the fact that aJI scores may be subject to 
uncontrolled errors. 

For example, a student who scores in the lower quarter of the 
class in scholastic ability may also score In the lower quarter of 
the class on the achtsvement test. Such a student female No. 14 
in figure 2. Since these twc scores lie within the band and are of 
approximately the same magnitude, her achievement would be 
interpreted as consistent with her ability. In fact, any student 



26 INTERPRETATION OF TEST RESULTS 

whose scores fall within the band can be considered as working 
at the expected level of achievement. Similarly, a student who 
ranks high on one test would be expected to rank high on the 
other, for exan pie, female No. 2 in figure 2. 

Students who are below the error band, such as No, 4 and No, 
13, are probably underachievers. When stu'^nts are under- 
achieving, they should be referred for counseling and some at- 
tempt made to find the causes of the difficulty. Sometimes the 
student may have been ill and missed certain fundamental lessons. 
Since .succeeding assignments assume knowledge of these basic 
materials, the student fails to accomplish current requirements 
and falls further and further behind — as shewn by an achieve- 
ment te.st. Of course, the teacher should try immediately to dis- 
cover and to correct such difficulties. 

Note that No. 4, who is high in verbal ability and low on the 
English achievement test, received a *'D'* at the last marking 
period. She is in the top sixth of her class in verbal scholastic 
ability but is only operating in the lower part of the bottom 
quarter of the class on a related English skill. Perhaps this stu- 
dent needs special help to improve this skill. On the other hand, 
it may be that she has **31 applied her apparently high verbal 
ability to school tasks and merely needs more encouragement and 
challenge to get down to work. Or, perhaps, some other difficulty 
can be discovered in an interview with her. The scattergram 
does not identify the specific difficulty a student may have, but 
it often calls attention to students who have problems. 

Students in the upper left-hand section of figure 2 (No, 5 and 
No. 8) alx)ve the error band are often considered to be "over- 
achievers,’* if it is possible to accept the the idea of a child 
"overachieving. ” As Froehlich and Hoyt have suggested, "It 
must be recognized that the term ‘overachiever* is a relative con- 
cept, for no one exceeds his capabilities in achievement. As used 
in this di.scussion, it connotes that a student is achieving rela- 
tively better than others in the group with like capacity,*' -’ 
Overachievenient may occur when a student devotes an unusual 
amount of time and effort to schoolwork outside of school to meet 
certain classroom standards. Sometimes the question arises 
whether or not this should be encouraged if the child’s health 
is being affected. Perhaps the child is trying to compensate for 
lack of Ecceplance among his peers by excelling in his lessons. 

‘ KrceVifich Cliff.. rd T. Hi'yt, Ktrineth B. G«icf<inrf Tfttir.9 arid Othrr S/wdrn! Ap- 
f'ror • ft jf for Trarhrn ond Coytritr^tfri. Id ed. Science Rfsc*rch 

I . Ui lU. 



o 

ERIC 



29 



INTERPRETATION OF TEST RESULTS 27 

There may be a personality problem which needs attention. Or 
the youngster may be highly motivated to achieve because of aa 
overwhelming interest in a given area. Again, it must be em- 
phasized that the scattergram will not solve a problem, but will 
call attention to problem areas. 

In some class situations it may be helpful to know whethei 
there is a difference in the aptitudes and accomplishments of 
boys and girls. Color coding or the use of visible symbols 
could make such information readily available. In figure 2, a 
square is used to indicate a male student and a circle a female 
student. One might discover that the boys are doing better in 
science than the girls, with one or two exceptions; or the girls 
may excel the boys in art, but one boy may be better than the 
best girl in the art class. 

The inclusion of the grades with the test scores in figure 2 
can point up some special problems. Why did No, 3, a boy, receive 
a C, when he seems to have a high aptitude for English and does 
well on the achievement test? Does he have personality charac- 
teristics which clash with the teacher? Or his peers? Does he 
cause trouble in class? Does his grade include a number of noii- 
academic components other than measures of English accom- 
plishnient? 

On the other hand, why did student No. 12, a girl, who i below 
the median, or ^Oth percentile, in scholastic ability and in the 
lowest quarter in English achievement, receive a grade of B? Did 
she really do unusually well during the last marking period be- 
cause of long study hours, or were other, nonacademic, factors at 
work here? 

It is not to be inferred from these questions that students 
should be graded on the basis of test scores alone. Certainly, 
daily work, class participation, and the quality of class projects 
must be included in the term mark. However, the teacher should 
be aware of such grading discrepancies when they occm . 

Prediction Tobies 

The scattergram method is particularly useful as the basis for 
developing a prediction table for the probable succe.ss of students 
in a particular subject or fcr their probable lota! grade^point 
average in succeeding grades. As more cases are accumulated 
which may be used in deriving a prediction table, the better will 
be the prediction. 

When one begins to develop prediction tables, pair.s of values 
are needed. These pairs may coiisi.st of two te.-t scores, (\so 



30 



28 INTERPRETATION OF TEST RESULTS 

grades, or a test score and a grade. Often such pairs of values are 
available for only a limited number of students. In such a situa- 
tion, the first derived results must be used with caution and with 
an awareness that there is always a certain amount of error as- 
sociated with tables of thi.s type. However, it is helpful to have 
some information from which one can gain insight. 

Ordinarily, in any one school the students of a particular grade 
are very similar to those who have passed through the school in 
the preceding 2 or 3 years or who will be enrolled in the following 
years. This assumption is fundamental in the construction and 
the use of prediction tables. Thus, it is important that new tables 
be constructed each year to include the most recent class upon 
which information is availaole. As the results of later tablM are 
based upon larger numbers of students, the percent of probable 
success will tend to stabilize. Thi.s will be obvious because the 
prediction percents in each square, or cell, of the table may 
shiit only a few points or not at all. If the character of the 
student population in a particular school changes because nf 
econc nic or other reasons, new tables must be immediatel^f com- 
puted based upon this new group. 

It is impossible to compute probability tables to be used to 
guide those students in the present riinth grade as they prepare 
for tenth-grade work, unless comparable information is available 
on the class currently enrolled in the tenth grade, which is the 
same class which was enrolled in the ninth grade last year. If one 
were considering grade-point averages, this class must have al- 
ready completed the tenth grade and the final grades must have 
been entered on the cumulative record cards. 

As an example, suppose that two grade-point averages are 
available for each of 60 students in the same school for the niiith 
grade and the tenth grade. The counselor desires to know the 
probability of a student in the ninth grade with a certain 
grade-point average achieving success in his tenth-grade work. 
Since the identification of which student made a certain average 
is not of interest, the first step i.s to make tallies indicating 
the pairs of averages for ej.ch studr'nt in the appropriate 
block or cell. T'hese tallies may be replaced by a .single summary 
number in the proper cell, as shown in part “A" of figure 3. The 
ninlh-gradt average appears in the cent ir of this table. The 
totals in the left-hand column labeled “row sum” and in the 
lowest row labeled “coluniU sum” indicate the total number of 
cases in each row or column. The number “50” in the lower 



o 

ERIC 



31 



INTERPRETATION OF TEST RESULTS 29 

left-hand outside corner is the sum of the rows or columns and 
provides a check on the number of entries. 



"A” 

lOth grade 



>'B" 

10th grade 





NUMBER 

with each grade average 


9th 

grade 

average 


TERCENl 

with each grade average 




Row 

sum 


F 


0 


C 


B 


A 


F 


D 


C 


1 

B 


1 

A 


Totol 

row 

percenf 


5 








2 


3 


A 








40 


60 


100 


fO 






3 


6 


1 


t 






30 


60 


10 


100 


20 


! 


4 


10 


5 




C 


5 


20 


50 


25 




100 


10 


2 


3 


5 






D 


20 


30 


5C 






100 


5 


2 


2 


1 






F 


40 


40 


20 






100 


50 1 


5 


9 


19 


13 


4 


Column 

Grade-oojnt 



^ Sum of row Of coijmo sumi . Grcde overoge 

F - 0.00 -0.50 

0 - C.51 - T.50 

C = 1.51 - 2.50 

B = 2.51 - 3.50 

A = 3.51 - A.OO 



figure 3 . £xp«ct«h<ir Toble for Predicting the Prebobilify tKol o Student with o 
Certoin Grode*Poi’nl As sroge in the Ninth Grode Will Obtoin o Certain 
Grade>Poinl Averegc in the Tenth Grode 



The appropriate lOw sum is used as the di\ isor to determine 
the percents placed in corresponding cells in part ‘*B” o? figure 3. 
For example, the 2 under B in the top row of “A” is divided 
by the 5 and multiplied by 100 to give 40 percent. That ii^, 
2 5xl00=40S< . 40 is entered in the first row under U in part 
“B" of figure 3. Then the 3 under A in the first row of ‘'A” is 
divided by 5 and multiplied by 100 to give 60 percent. Tlial is, 
3/5x100=60^^. The OC is entered under A in the first row in 
part *'B'*. The sum of 40 plus 60 E'^es 100, as indicated in the 
"total row percent" column. This number indicates that all per- 
cents for this row are probably correct. In the second row from 
the top of "A", the row sum is 10. Similar computaiions may be 
made as before. Thus, 3 10x100=30^, ^ \shich is entered under C 
in the second row of part "B". And so on. The right-hand column 



O 

ERIC 



32 



80 INTERPRETATION OF TEST RESULTS 

totals of part “B” all add to 100 percent for each row, which 
serves as a check. ^ 

Part of figure 3 is read as follows: If a student made an A 
in the ninth grade, the chaii:es of making an A in the tenth grade 
are 60 out of 100, the chances of making a B are 40 out of 100, 
and the chances of making a B or better are 60-f40, or 100 
chances in 100. If a student made a C in the ninth grade, the 
chances of earni.ig an F in the tenth grade are 5 in 100; of 
making a D are 20 in 100; of making a C are 50 in 100; of making 
a B are 25 in 100 (or 1 chance in 4) ; of making a C or better is 
50-f-25 or 75 in 200 (or 3 chances out of 4) ; etc. 

A table similar to that just described could be constructed by 
any counselor on the basis of high school grade averages of all 
college-going seniors from his school who have completed 1 college 
year and for v horn first-year college-grade averages are available. 
(The high school grade average would be represented the 
middle column between ^^A" and “B”.) One must recognize the 
inherent inaccuracy of such a table when one combines college 
freshmen grade averages from many different schools with 
varying standards. Part of this error can be eliminated, however, 
by constructing a table based upon those students who have 
entered and remained in a single nearby college. 

It is possible to construct any number of related scattergrams 
such as English grades in high school versvis English grades in 
college, or English test scores in the tenth grade in high school 
versus the course gradrc in college freshman English, or the 
grades in high school mathematics versus the grades made in 
(he college freshman mathematics course. A study of such rela- 
tionships might suggest curriculum changes to the school staff 
or course changes for students in a college-preparatory curri- 
culum. 

Sioiilar tables based upon scattergrams could be produced by 
the registrar or admissions officer of a university or college in 
order to predict a college grade-point average on the basis of 
high school grades or scores on required admissions tests. With 
such n procedure, it would be possible to accumulate data on a 
number of different high schools and to predict the probability 
of success for students from each high school. Such tables could 
be constructed annually so that the information for each school 

'In I’ttctkf, line i i.und cflj entry to t>ie i. -« re>t percent. In tMe 

rise. t>ie jdw mtjt becftne 99 nr JOl, vnir.s irbitriry »(}jujtmenU «re mitfe 

that the tnti) is IOC In miVinir such «iflju*tments » ehtnye in the larrest percent «»Jue 
rive one le^^er re^itivr eircr. For esunple. from tn rtvhcr tl\«n from 

i I to s %. 



O 

ERIC 



33 



INTERFFETATION OF TEST RESULTS 31 

would be current and any changes or trends could be noted. A 
table based upon the data accumulated for several years could be 
developed and should be more stable than the data based upon a 
single year. It would be a service for the high school counselor if 
the college would circulate such tables to each high school for 
which a table is constructed. With the automatic data processing 
equipment now available in many institutions of higher educa- 
tion, such tables as the foregoing would be relatively easy to de- 
velop. Some of the college admission testing programs are now 
making available such information to the high schools and to the 
institutions of higher education. 

Some high schools and many universities give orientation 
period tests to all entering students. These test results can be 
related to freshman grade-point averages or to specific course 
grrde'< Such deriveo predictions of high school or college success 
should be made available to the appropriate counselors. 

A collection of probability or prediction tables of the types 
suggested here would be most helpful to the high school counselor. 
He could use them to help a student become aware of his proba- 
bility of success at a particular college. It is possible that a 
student who would be at almost the bottom of his entering class 
at one of the highly selective private universities could well be 
at the middle or much above average in his class Tt another in- 
stitution. The student should then be able to make an ‘'educated 
guess’* as to the school where he has a good chance of being 
admitted end where he would be challenged to do his best work. 

Error Analysts Made Outside of the Classroom 

Most standardized achievement tests are designed to cover a 
rather broad area of a subject-matter field. A diagnostic lest, on 
the other hand, is constructed so that certain important sections 
of the subject are covered a number of times from different points 
of view in an attempt to define specific areas of deficiency. In- 
struction in such areas can Ihen be emphasized further by the 
classroom teachei*. It is often possible, however, to use an achieve- 
ment test for diagnostic purposes. The procedure described here 
can also be used with a teacher-made test. 

The priiicipal function of an error analy.sis is to obtain a sum- 
mary picture of the items mis.sed most frequently by the class. At 
the same timci it is possible to note which .students mis.s or omit 
certain types of items, 

III setting up the table of errors, a teacher should examine each 
le.'^t item in order to determine the topic being covered. The time 



3 ^ 



32 



INTERPRETATION OF TEST RESULTS 



needed to determine the topic for each test item can be shortened 
by having several teachers work together. For some standardised 
tests it is possible to use the analysis of items which the publisher 
may have included with the administration manual, che answer 
key, or the interpretation manual. At least one publisher includes 
a short topical description with a carbon-marked duplicate answer 
sheet designed for the teacher’s and student's use. At least one 
test-scoring firm includes an error analysis as one of its service 
options. By means of currently available electronic data-proc- 
essing systems, it would be possible to group the analysis of 
similar test items in adjoining columns on the report sheet. 

If a teacher must design his own table of errors, he may find it 
helpful to identify groups of related item numbers by using a 



Tonic or Conce pf 






V r 



Quest ion 
Adomi, A. A. 
Andci feld , J . 
Bloclcster, M. 
C fofksen , W . 



Smith, J . 
Tinker, T . 

Wl i I iomson , K , 
frrori 
Omils 



1 


2 


3 


4 


5 


6 




y 






0 








0 




y 


y 




y 




y 






y 


0 










0 


0 























34 


35 


36 




Omit 

totol 




y 






11 


3 




y 




/ 


25 


4 




0 


y 


0 


15 


6 




0 


y 






















y Errort 0 Omitl □ Reloted itemi 



figure 4. Teblf Shewing Crreri Mede on • Ttti ef ArithmcMc Cencegrt 



INTKRPKKTATION 0?* TEST RESULTS 



33 



color code or light shading. In figure 4 which represcnt.s part 
of a table of erioi’s, one g^roup of related items has been shaded. 

The '‘Error tutar’ column on the right side of the table in- 
dicates how many questions were marked incorrectly by the stu- 
dent while the "Oiiiit total” column indicates how many items were 
not attempted. Since the teacher is aware of the amount of time 
allotted for a test and whether oi* not each slmlent had ample tinje 
to complete the test, he can judge whether omitted items are a!i 
indication of a lack of knowledge or a shortage of time, Since 
the total number of correct items is rccortled on the student’s 
answer sheet, it is not necessary to indicj^^e the total numljer of 
correct items. 1'his total could be casil\' obtained by subtracting 
the error total plus the omit tntal irom number of questions 
in the test. Since the totair, given in figiuc 4 arc for a complete 
table, the reader may not l)e a’ole to verify all totals. 

The “Eirors” row* at u • bottom of the table indicates the 
number of students missing each que.stioii. For example. 15 
students of this class of 36 students missed item 5 on averaging 
and 3 students omitted it. A relatively few students mi.ssed the 
other items on averaging <iuesti(ms 3 and 36, However, 8 sti^dents 
omitted question 36. As a teacher one would be concerned because 
of the large number of errors in question 5 and the number of 
omits for qiiestion 36. 

Questions 34 and 35 are f int . ,^t here since more than half 
of the clas.s tried the items and nns,sed them, while a number of 
others decided to omit them — esjjecially item 34, 'rhcrc are 
several po.^.sible explanations, Fir.st, the topics were difliciilt for 
the students. Second, thc.'^c topic.s had not yet been pre.scnted. but 
were to be included hder in the course. This latter explanation 
may be e.specially true if a standardized tc.st is administered ea»ly 
in the fall or in the middle of the school year, or if this test is con- 
.sir acted to cover the v^ ork of several years. 

W’ith an aiialy.sis such as that sugge,sted by figure 4, the teacher 
can quickly determine the areas of .strength for the class and 
the mo.^t commonly missed items. Students with common areas 
of weakness con bo given .special in.'^tniction in small gixuip.s. 
There should be a miinmum amount rf class time spent discussing 
quest ioi s nn,-SL(l by only a few .'Students, Quo.^tion.s <ui pi Lviousl\* 
pre.^euted concepts which were mi.s.sed or oniitteil by a large num- 
ber of students niu.st be cxaniintd again. However, items which 
anticipate topics to be developed in a later part of the com,<e 
.should be considered at the appropriate time. 



34 



IMERPRETATION OF TEST RESULTS 



Error Anolysis Made Inside the Clossroom 

Paul B. Diederich suggests that an error anal 5 ^sis of a test can 
be done during classroom time by having each pupil watch a 
paper other than his own."’ If the teacher is only intei*ested in an 
overall “error analysis/' i.e., in how many pupils chose any one 
of the wrong responses to a test question, then the teacher only 
needs to call out, “Item 1, ‘b’ is the conect answer. Each of you 
holding a paper in which item 1 was misscdy raise your hand/' 
Then he, or a class monitor, can quickly count the raised hands, 
record the number beside the test que.stion, and proceed to the 
other questions. Thus, in a few minutes, the items missed by the 
greatest number have been identified, Aftei’ the papers are re- 
turned to the students, the te«acher can quickly go over those 
questions which were missed most often and explain why they 
are incorrect. 

Item Analysis Methods 

There are several methods for analyzing objective test results 
which make it possible to determine one or more of the follow- 
ing points: 

1. of an item — The percent of the students of the ctasa 
answering the question correctly. 

2, D/.cf riwiunj power of the correct answer — The capacity of an 

item to distinguish between good and poor students; the percent of 
the highest scoring students answering the question correcUy as 
compared with the percent of the lowest ranking students answer- 
ing Lhe question correctly. 

rf ti't ursR of each response for each lest item — The of 

students selecting each response (each response should be chosen 
al least once) . 

4. /d#»r f\cai\07^ of each student making a correct or incorrect choice 
each item — Perinits an individually designed corrective pro- 
ccduje for each student. 

In a fuw school systems it is now possible to carry out ?.n item 
analysis ciiliiely by mean.s of an attachment to a testscoring 
machine or !;y the use of automatic data processing oquipment. 
In othe: school.s where st^ch services are not available it may be 
ncce.‘?.«^ary to use other n.cthods. Bt facB much .«itudent interest 
may be aroq-cd by carr.ving out such pvoced n*€s during the cla.<?s- 
room iKiiol wl n the scored papers arc I'eturned. It has been 
found that pupils at all grade levels, from the primary grades 

l>:rrf.nr>i y. il n SttrUflifj iitr Tt <irKi r-ir rriD.-ct'-n, S3. r<rira- 



o 



INTERPRKTATION OK TEST RESULTS 35 

through graduate school, cooperate willingly. The students are 
interested in learning how many of their peers missed each iteni, 
why they made an incorrect choice, and the best answer for each 
question. If .such an analysis has been completed for the teacher’s 
own objective test, he imn'.ediately ha.s information which can 
assist him to improve his test items for future use. He can then 
lalld up a test file of items of a known quality and difliculty 
which will discriminate between his good and poor students, 

A^tahj^^is . — For some te.sts the teacher will find it 
helpful to use the classroom procedure which Diederich calls a 
“high-low” type of item analy.sis.^’ This method will reveal both 
the dilliculty and the di.scriminating ijower of each item. 

To det'^rmine the di.^cviminaling power of an item, it is neces- 
.<ary to split the class into iwo sections — those with high .scores 
arid those with low scores. The separation point is the midrile or 
median score lor the cla.'i.'J. To find the median score the follow- 
ing steps arc necessary. Determine the range of scores of the 
cla.ss, that is, the highest and lowest scores, and record them at 
the top and bottom of the blackboard. Write all possible scores 
occurring in this interval in a column, beginning with the highest 
score at the top of the board and continuing to the lowest. Divide 
th3 number of class members by two to determine how many 
papers must be tabicd in order to find the middle one. Beginning 
with the highest score, ask how many students made each score 
and record the results. As soon as the cumulative total number 
of papers equals half the cIh.ss, the middle score can be determined 
without completing the distribution of score.s. 

If there are several students’ papers at this middle score, col- 
lect the.'^e papers fir.'^t. Then collect all papers in two groups — 
those above the n^kldle .«^core and those below. Distribute all 
papers above the n ""dian score on one side of the room, and those 
below the median .score on the other side. Then assign the 
several papers with the median .score to the high and low .side at 
random ,<o that the total number of paper-* on each sid<' i,s the 
.same. If there should be an odd number of papers in the class 
.so that they cannot be evenly divided, the discarding of the one 
paper remaining will leave one student to act as a recording 
monitor at the board. 

It i,s possible to get a certain aniount of teamwork in this 
operation if a capta a is ajipointcd for each of the two group.s, 
The tcacboj-, or the class member with no paper, can \vrile the 

■ n 'I . r I'l, 




38 



3(3 INTERPRETATION OF TEST RESULTS 

question numbers in a column on the board and make 4 column 
headiiiprs : 

H L H f L H-L 

These hcacliiiK-^ stand for: 

H ' th<‘ nuii^bcr of the "hi,7h*’ ^I’oup who mark th^ item correctly 
L- the number of the “low’’ ^roup who mark the iUun correctly 

H4-I-. — “(litTiCulty iiulcx,’ the total numVier '*>»ho marked the item cor- 
rectly 

II -I,- “difseriminatlon index,” how many more of the “hi^rh” jrroup 
than of the “low” e:ro^P marked the iten; correctly 

Wlicn the teacher asks, "How many have item No. 1 correct?’* 
each student with the correct answer on the paj^er he is watching 
raises his hand. The captain of the hi^h proup calls his luinibei*- — 
the "H” score. The captain of the low prou]) calls his number — 
the "L" score. These two nuntbers arc written on the board and 
then the recorder computes and calls out the two scores for 
”H - I/’and 

These four minibcrs are always obtaii^ed in the same order. 
Kach student writes these four nun\bcrs on the answer sheet be- 
low each (luestioii as it is computed hy the board monitor. Each 
/licmbor of the class checks on the sum and difference. With a 
little practice. Dicdcrich " says, this item analysis can be carried 
emt for a onc-i)ci*iod test in about 10 to 20 niiinites depending on 
the number of items. This i.s much faster than the operation 
could be completed by the teacher. At the san;e time, .in c-xcellent 
learnii^p sitvudion develop.^ since each .student become.^ involved 
in the test results for the cln.'^s as a whole and wishes to know 
why he has missed some of the items. 

If an item is acceptable for inclusion in later tests **hiph-low'‘ 
difTerences should be equal to at least 10 percent of the size of 
the Class/ For example, with a class of 36 the differences should 
be equal to at least 4. However, because of the barge value of the 
"standard error,*’ an item the "tnie" ditTcrence of wh.di would 
turn out to be 6. might in some cases give a value of less than 4. 
In other words, if the difference i.s small, one .should examine the 
item clo.^cdy. If it .seems to l)c a wcll-con.'^tnictcd item, it sliould be 
retained. Dicdcrich .^ugecsts that "not dioio than a fiftli of t)ie 
items in the final test should f.all beif.w the suggc.'^ted .standard 



3' i.J , I 
n if , I 



39 



INTERPRETATION OF TEST RESULTS 37 

and the high-low difference should be above 10 percent of 

the clasr 'eferably 15 percent or more.'' 

The H-f-L miniber, which indicates the total number of students 
choosing the correct answer, indicates the difficifUy of the item for 
the class. The lai’ger the number, the easier the item. In most 
cases, an item whiclj 90 percent of the class marks correctly i.s 
too easy. On the other hand, if less than 30 percent of the class 
marks it correctly, it i.s probably too difficult,*" 

Occa.sionally, especially with a teacher -made objective test, a 
greater number in the low group will obtain the correct answer 
than in the high group. Then the H-L becomes n'^gative, as in 
question *1 in figure 5, which is called ^'negative disci*imination.” 
When this occurs, the item needs further investigation. Careful 
examination of such an item may reveal that a few changes will 
improve it so that it need not be discarded. To determine what 
changes are neces.sary, the teacher might ask rnch member of the 
rlas.s why he cho.se one of the incorrect responses, and determine 
whether or not the key response was ])Oorly written. For ex- 
ample, the correct response of the answcu' key might not attract 
the better students if .«^ome of the suppo.sed incorrect choices, or 
distractors, were actually correct, A rewritten item may be 
placed in the teacher’s item file and tried again in a later ex- 
amination. 



Question 


H 


1 


H+t 


H-L 


1 


18 


18 


36 


0 


7 


16 


4 


20 


12 


3 


13 


9 


22 


4 


4 


7 


13 


20 


-6 


5 


9 


7 


16 


2 


6 


5 


7 


12 


-2 


7 


9 


9 


18 


0 


8 


6 


2 


8 


4 


Figure 5. 


Fvomptci 


of KigfiTow 


Item Anolysii [H~ 


:36J 



In figure 5 the results of the analysis of several test questions 
are given for a class with p 3G students. Item 1 is an easy item 
(H fL~36), since all members of the high and low' group marked 
it correctly. It has the hig^hesl po.ssible diiliculfy index — 30 — 
which indicatc.s an easy item. (The lower the H - h score, the 



• \hi.i , 

' ■ H , I'. 



10 



38 



INTERPRETATION OF TEST RESULTS 



more difficult the item.) Since a*! students in each half marked 
the correct answer, it certainly will have no influence in dis- 
criminating between the high and low groups. Unless one desires 
to begin the test with an easy item, this item would not be used 

another test. 

Item 7 is harder than item 1, with a difficulty iiidex of 18. 
Since H— 9 and L= 9, H-L is 0. Therefore, thus item will not 
discriminate between the two groups and would not be usea in its 
present form. 

Item 2 is of averaye difficulty and is the most discriminating 
item illnstrated, with H-L=:;12. 

Items 3 Mid 8 jn^t barely meet the criteria for the level of 
discriminati<iii (H-L) with the suggested value of 4 <i.e., 10 per- 
cent of 36 is 3.6, which is rounded to 4). Item 8 is more difficult 
than item 3, as .^hown by the imb’ces of 8 and 22, respectis^ely. In 
fact, a te-t should not include many items as difficult as item 8. 
The teacher might examine this item to determine whether it is 
measuring a fundr mental concept which must be taught again, or 
if it i.s referring to an insignificant detail which should not have 
been included. 

Items 4 and 6 are example's of '*neg. live discrimiuBtion.’’ More 
students in the hwa' group selected the right aiisw*' r than in the 
upper group. Although the difficulty indices suggest that the 
items are not easy, these items should be rejected until they are 
examined and rewritten. 

Item 5 is more difficult than questions \ through *1, however, 
.since the discrimination index is only 2, it would n I be used m 
future tests without some revision. 

^'Alternate Rcspott.H'* — The alternate responses, or 

choices, prepared for multiple-choice items often include those 
responses which students have been known to make most often in 
short-answer or free-response questions. For example, in mathe- 
matics or science the most frequent incorrect answer choices are 
those which would result If common errors were made in arriving 
at a solution. (In order to prevent a student from spending too 
much time on a problem, the last choice is often "none of the 
above.") The teacher may be more interested in the I'imis of 
student errors than he is in knowing merely that a certain num- 
ber of students missed a question. In this situatioi), the analysis 
Would be carried out in this manner by the teacher: "Question 
No. 1 — How' many students selected choice 1?" (pause and 
record), "How mar.y students selected choice 2?" (pause and 
record), and so on, for each of the choices for each question. 



INTERPRETATION OF TEST RESULTS S9 

Since in most cases the majority of the class will choose the cor- 
rect response, the response count takes only a few minutes. 

Itevi Anal If sis bit Test Scoring Machine. — If a school sys' em or a 
school has the IBM 805 Test Scoring Machine, it may have availa- 
ble the attachment called the Graphic Item Counter. This attach- 
ment provides one of the quicke.st and most accurate ways for 
making an item analysis. After separating the scored test papers 
into upper and lower groups on the basis of the total test scores, 
the machine operator can obtain the number of students in each 
group marking each response to each question. This information 
can be obtained for 18 5-choice questions at one time, since there 
are 90 counters available. If 4-choice, 3-choice, or 2-choice ques- 
t! ms are asked, one run of the answer sheets through the machine 
will handle 22, 30, or 45 questions, respectively. If one wishes to 
learn only how many students answered each question correctly, 
us many as 90 questions may be analyzed at one time. 

The procedures suggested before are for use with a single 
class or a department in one school. In developing and standard- 
izing a new test, more cases would be needed than those of a 
single classroom and the procedures shovild be followed which are 
de.se ribed briefly in Understanding Ti sting ’’ or given in detail in 
tJdncattonal Measure me nt.^ - In making an item analysis for a 
.single classroom, it seems appropriate to divide the class into 
halves — upper half and lower half. If an item a?ialysis is based 
upon a test administration to 400 or more students, then the 
upper and lower 27 percent of the total group will give the be.^t 
results. 

An Item Aiuilysis Sheet can be mimeographed with the head- 
ings and form given in figure 6. By u.'^ing legal size paper, it is 
po^'.'^ible to analyze 10 questions in each column. 

The figures for the “No."' columns under "Upper Group" and 
"Lower Group" are obtained directly from the Graphic Item 
Count Record. The "No," under "TOTAL GROUP" is the sum of 
the quantities under "No." in the Upper Group and I^wer Group. 
The percents arc obtained by dividing the recorded numbers by 
the number in the upper cv lower groups and in the total group. 
An example will make the.^e c; ilatioirs clear. 

Supjiose that there are 40 students in a class and the division 
into halve.< places 20 students \i\ the Upper Gmup and 20 .students 
in the Lower Group. In item 1, choice 1 wa.s marked by 15 stu- 

,Vf r RUthtin. Krr.nt^h f, How f'. * Tc'-l Huilt’ In I rri,rftArdtrg Ttuftig NS'iii-hirct 'n ; 

I S, < r-.< ri r. nu nt runtire Ortkr, < T. I',*: OfTn-r i.f i'ln, 

a hem ScJccti.n Tte Jr Hdxr nUartU V< (T 
('■n Ur ; Ameiicun i-n KM I'yM, 




'52 



40 



INTb'RPRETATION OF TEST .RESULTS 



ITEV. ANALVSiS ^HEET 




CD Cofrci.t Herr. cKoice 
V Item diicrlmtnotlon deilred 

' V ' ' NtQO* ' P ' d i >cri m loot I 50 

^ A -large r."ber " cS>5ie t>ie 5ome 

Incc'rect cHolco 



Reqjirements ^or 
Satlsr'c^ory I’^em DHcrlminolioo 
(for correct Choice) 

^□r\g« of voluci Oiffereice 



'Upper Q'oop 
ood 

to»v€r gr0i.p1 



90-100 

SO-90 

20-80 

10-20 

0-10 



txQfrple; Cf^’ce 1 Is \he correct enj-^er for 
Herp No ,1 . 7y-'C of th*- upper group or>d 25 ‘^c 
of tl.e lo«er S'Oup cKoose choice 1 . These 
vclucs lie Irs the 20-80 tor-ge, requirmg a dif- 
ference of 1 5 % or rnorc lo te occeptabic ( 75 - 
25 ^)* Therff-'re, the Item dlscrlmln gto 
sotlsfactorll^. f 1 € toloi group % for the cened 
onl^€r, choice 1 , 50 . Her.ce the difficult k 

indey, it 50 c. 

fioure 6. Sample Item Arsolyiii Sheer 



(Upper group 
\ minus 
j lower group) 

I 

1 5 Or mor e 
lOcr more 
, 1 5 or more 

I 10 Or more 
! i or more 



rie,U. in the Upper Group and 5 MudenU in I.o.er Group^ 
It) the TOTAL GROUP, 20 (15 phis 5) marued choice 1. Choice 
2 wai^ marked hy 1 student in the UpP^r Group 
the Lower Group which gives a sum of 2 for the TOTAL GROUP. 
This procedure continues for each choice for each item in the test. 



43 



INTERPRETATION OF TEST RESULTS 41 

For rapid computation one can easily construct a table of per- 
cents corresponding to the number of students in half of the total 
group, going from 1 (which is 5^/c ) to 20 (which is 1007^)* Then 
one fills in the 7^ columns in the item analysis sheet for the 
Upper Group and Lower Group, (If this is done with a colored 
pencil, later analysis will be easier. In figure 6 the 7^ columns 
have been shaded.) In Item 1 this becomes for choice 1, 75 and 25; 
for choice 2, 5 and 5; for choice 3, 15 and 35; etc. The sum of 
the percents in either of the.se columns should not exceed 100 by 
nmre than 37, which is the maximum which might occur in some 
clas.ses because of rounding errors. The total may be less than 100 
if one or more student.s omit a question. 

Another table of percents should be constructed cor responding 
to the number of students for the total group, in this case going 
from 1 (which is 2.57, rounded to 37) to 40 (whi h is 
1007 ), Then one fills in the column under TOTAL GROUP. 
{One can .save these tables and develop new ones as they are 
needed when class size changes — beca';se of absences at test time 
or changes of class size i!i a new school year,) 

It has been shown in the literature that the test best able to 
put a class of students in rank order is one which has item diffi- 
culties spread over most of the range, but which has an average 
item difficulty of 507 . with the greatest number clustering 
about 50 . 

As the next step, examine each test item in figure 6 and code 
it as suggested: A circle (O) around the correct answer choice; 
no further mark if the item appears .satisfactory; a single check 
(, ) if the iten; discrimination is less than desired; a double check 
(, , ) if there is negative disci imination ; and an ‘'X" if a ‘large 
number’’ select the same incorrect choice. 

In item 1 the correct an.swei is choice 1 ; 75 'v of the Upper 
group and 257 the Lower grotip marked it correctly, Since 
there is a (lifTercncc of 507 (75 minus 25), which is much greater 
than the suggested minimum dilTercnce of 157, this item dis- 
criminate.s .satisfactorily and would be a good one to include in 
future tests — if its other rc.-i>on.-e> are satisfactory. Each of the 
other choice.s was operating since each was chosen at least once 
by some member of the class. 

Item 2, with choice 2 as the correct one is an easy item — 05 7 
the total group of >tu<Ients marked it cojToctly, The hnjffr 
the percud the ros/o the item. The item doe.s di.sciimiiiale .‘^atis- 
factoiily at this level, .since thcic is a difTerence of 10'- (100 

minirs 00), Uhoicc.s 1 and 4 should be rteximiijicd, since no one 



44 



42 



INTERPRETATION OF TEST RESULTS 



chose them. Some test con.structors believe thr»t a few ce.sy items 
of this difTiculty level at the beginning of a test helps to put the 
examinees at ease. Almost every student’s score is raised one 
point by such an item and his relative rank may not be changed 
at all when one considers the complete test. 

Item 11, the number of the item at the top of the second 
column in the Item Analysis Sheet, .shows that each choice was 
selected by some of the student.s. The double check (//) in- 
dicates that there is a '‘negative di.scrimination’' with this item, 
which means that more students in the Lower Group chose the 
keyed an.swer than in the Upper Group, As a result, one obtains 
a discrimination index of minus 207^ (40 minus 60). This item 
does not assist in ranking the students in the proper order, but 
rather makes the rankings less dependable. The difhcuUy index, 
as .shown in the TOTAL GROUP ^/( column is 50^? , the same as 
item 1 — but this item 11 should )}ot be used. One should examine 
items of this type to be sure that one has not made an error in 
developing the answer key. Becau.se of rounding the TOTAL 
GROUP for all choices is 101. 

Item 12 is an example of a questio.. which does not discriminate 
at the desired level of 15Ur but only lO^r (40 minus 30). How^- 
ever, if reconsideration of the item shows that it is a good item 
and important to the course, retain it. Choice 1 should l)e 
changed — it was so poor that no one selected it. Choices 2 and 5 
are chosen by only one student each and are much wea':er than 
choice 4. Choice 4 must be considered, since it has beeii marked 
with an “X.” Why did so many students in both the Upper and 
Low?r Groups select it? Is it the statement of a commonly ac- 
cepted fallacy? Is it so ambiguou.s that in one sense it may really 
be correct? Does this question cover a basic part of the course 
which needs reteaching? Ha.s this que.stion been keyed properly? 
If choice 4 should be determined to be correct rather than choice 
3, then one would have '‘negative discrimination” as in ques- 
tion 11. Since one student in each group omitted the question 
the Total Group is 96. 

Comments should be made concerning test items omitted by the 
student. As one becomes experienced in examining an Item 
Analysis Sheet, he qtiickly becomes aware of the few items which 
many students failed to answer because of the low mimlHis in {ho 
TOTAL GltOUP column. If the test is timed, these items 
would come, in niost eases, r.ear the end of the test. If they occur 
randomly throughout the test, the teacher should examit } the 



^ ^ ^ ^ INTKRPKET^TIO^ OF TEST RESULTS 43 

lesson plans to be certain that they have been previously covered — 
ajid then reteach them if necessary. 

When the item analysis has been eompleted, a summary table of 
marks may be made of the number of single or double checks or 
X’s. As one becomes more skillful in constructing one’s own tests 
and using again items which have been tried out and found sue- 
ces.sful, he will discover the number of marks diminishing. How- 
ever, it will be a rare occasion when, for any given class, there 
will be no marks. This would also be true of .standardized tests 
which can be analyzed in a similar manner in order to discover 
the weak points and errors in thinking of the students. 

If each item u.sed on a test is typed or pasted on a separate 
card, cataloged as to topic, and the aforementioned kinds of in- 
formation concerning discriniination and difficulty recorded, it is 
possible to build a pool of pood items which can be used in later 
classes. By recording when the item is used, the repetition of the 
same items in succeeding terms or years can be avoided. If the 
f(uegoing analysis .shows (hat an item is poor, it should not be 
used unless it is rewritten. 

Item Analysis by Typewriter . — If a teacher wishes to make an 
item analysis himself, he can speed up the procedure by using 
what is called the “typewriter method "’ This method will be 
described in detail. 

After sorting the papers in order, accordirig to their scores, 
highest to the lowest, divide the papers into two halves at the 
niedian, as described previously for the “High'I>cw“ Analysis. 
Select the pile of answer sheets for (he “high” group first, still 
arranged with the highest score on top, Sit at a typewriter and 
select any set of five keys — if five-choice multiple-choice items 
have been used. For e.vample, one might choose to use the keys 
Oil the typewriter with the letters or symbols at the “home posi- 
tion” for the right hand corresponding as follows; 

Answer choice 1 2 3 4 5 

Typew. iter key j k 1 ; 

If a student omits an item, then strike the space bar. 

Beginning with the paper of the first student with the highest 
score under the left hand use a finger to guide down the answer 
column question by question. In typing with the right hand one 
will feel mcertain for the first two or three papers but will soon 
establi.sh a typing pattern. Vor example, the teacher looks at the 
response to question 1, observes that choice 2 was selected, and 
types “k”; fnr question 2 . e observes that the student selected 



44 



IXTKKPKKTA’i inN OP' 'i KST HKSlfLTS 



the la.^t, or fifth, choice, so he upes “c”; for qaestioo 3, with 
choice 1 indicated type "j''; tpiestion 4 was omitted, so one uses 
the “space bar’'; for question 5, with choice ;3, type “1"; for ques- 
tion 6, with choice 4, type for question 7, with choice 1, lyi}o 
“j” as in question 3, etc. This procedure should he continued un- 
til a symbol or space has been made for one response for each 
item for a single .student on the same line. One may also type the 
student's name, if desired. The resi)onses for 30 items, including 
the seven above, miglit look like this with an extra space follow- 
ing <piestion lo ->eii g^ included as a tallying aid : 

kcj l;jlklj;lkj ;k<j;kljkr;jklj 

The next hig-hest lest pai>cr of the “Ingh" gjouj) should be 
recorded in the same manner on the second line (do not double 
sp'ice). This routine shoiikl be continued until the resj)onse.s to 
each question for each pap'r' in the "high” grouj) have been 
recorded. Thus the respcn.re., of every student to each (piestion 
are always in the vertical column, one below the other. At 

the end of the high group, triple space and proceed in the same 
manner for the “low” grouij. 

Experience has shnwn that it is sometimes helpful to space 
systematically for each p:i]ier as one records letters for the 
answers. For example, if regular IHM answei* sheets are used, 
sj?ac»" afltu- items which are multiples of 15, he., after items 15, 
30. 45. 00, etc. If answer .-heels designed for a specific standard- 
ized test are used, the spacing wilt vary. If one uses an answer 
sheet of his own construction, an appropriate place to space 
might be at the end of each column of answers. This provides a 
visual check for the end of each group of questions. If the 
teacher’s own answer sheet has been keyed with the correct re- 
sjumses, double space and type it in the same relative po.sition 
below the ansver rows for the “high ’ and “low” groups. 

Figure 7 shows part of an Item Analysis by i‘\ j)ewriter for 30 
questions, with details for (piestion 15. Note the separation of 
the high and \nw groups and that “II “ and “E“ are in the same 
<3vder, fmin top to IxjUom, as the gnoups at the top of figure 7, A 
'traighU Jge placed on the paper vertically ami to the left or right 
"f each qui. -lion's responses permits a rajiid count of the ii umber 
of re.*p(»iise.- for each group which are the same as that of the 
an>wer key. The inicrju'etations made previ<Ais]y for the “High- 
Low" Anal>>i- now ajqdy. 

Ai the b.j(toin of figure 7 it is shown that with this typewritten 
rneth<id it is also p<issiljli- to make the “alternate re.-ponse" analy- 



45 



Student 

response 

patterns 



High-^low 

analysis 



INTERPRETATION OF TEST RESULTS 
5 1 j. on 12 9 • « • • 

High kjlkjlkJloko^Iio kjlkjlkljikj^k; 
group jljkjlkjlk.f :j 1 kjkljsJcljkjlkjl 



jjkjlkjlkjljkjk Ijkljj^kljljjlk 
Low jjkjllkj^^ljk^ 
g.oup jkjlkjllkjl;^Jj Ijj ,*kljhhUclJ;l 



Answer jlk; «lk jlklji j jlk jlj ^ j 1 jk 




Mary 

Joe 

Jane 



Tom 

John 

Ruth 



II - L 



9 



23 



Alternate- 
response i 
analysis 




Figure 7. Sample ol an ffem Anolyi'^ by Typ«wfi1cf With o 
Oetofled Anplyiis of Item 1$ (N — 361 



sis by addinjr a row for each possible response choice below the 
“High-Ivow” Analysis, One counts and records the minibcr of 
responses for each choice or omitted item. One can then dis- 
cover the most common errors of the class. 

This method can also be used to give error analysis information. 
With (he vertical .straightedge in position it is possible to circle 
or underline in red the incorrect respon.s??. Since each row 
corresponds to a single student, individu^'! help can be given as 



46 INTERPRETATION OF TEST RESULTS 

needed to each student on the specific topics or areas covered by 
each item. 

In other words, with a little preplanning and with one handling 
of the test papers it is possible to use this typewriter method to 
derive a great amount of useful information. Other kinds of 
interpretations will suggest themselves as one uses this technique. 



o 

ERIC 



49 



VIII. Classroom Interpretation of Test Scores 

If A TFST OR TEST BATTERY is worth administering to all 
of the students in a class, schooh school system, or State, the re- 
sults should be reported t, the students, teachers, administrators, 
and parents. The parents should be most interested in the mean- 
ing of the test scores for their students. 

Before the day of the test the students should have been 
notified that the test was coming, told the purposes of the 
test, and show how the re.suUs might be interpreted. Afterwards, 
.‘^ince they have been involv ed in preparation for the test and have 
spent a number of hours diligently marking their answer sheets, 
the results certainly should be explained to them as soon as 
possible. 

The larger the testing program, the longer the interval between 
the administration of the tests and the report of results. How- 
ever, automatic data processing is now available which provides 
an accurate and economical method for making results available 
to the schools within a maximum of 3 to 4 weeks. These reports 
often include list reports of scores by class, grade, building, city, 
or State, together with individual press-on label reports for the 
student’s cumulative record folder, the teacher’s grade book, and 
the student’s interpretive leaflet. 

As soon as the test results are a\ailable, the teachers should 
be given an interpretation of the scores by a qualified principal, 
by one of the school counselors, or by the guidance director of 
the school system. Staff meetings for this purpose can include 
presentations of meaningful interpretations of the results as re- 
lated iu the school, as well as suggestions for interpreting the 
results to the students. 

The next step is to explain the results to (he students. One 
way to accomplish this is to have the counselor or teacher explain 
to each student individually what the scores mean and how they 
are good measures of his strengths and weaknesses. However, 
such a procedure is usually not an efficient use of either the 
teacher’s or the counselor’s time. 

One procedure which has been successful is to have (he teacher 
or counselor make a general explanation to a whole class. First, it 



48 



INTERPRETATION OF TEST RESULTS 



should be pointed out to the students that the results obtained 
from a battery of tests is a private matter. A student is under 
no obligation to show his results to anyone nor should he ask 
to see the test results of others. Such a stj.tenicnt may prevent 
student embarrassment. 

Since the students may have forgotten the types of items in the 
snbparts of a single test or each major poition of a testing pro- 
gram, which may have included a scholastic aptitude test and 
several achievement tests, it would be appropriate to review again 
the purpose of each test and recall sample items of each. This 
may help the students to relate the type of test to their own scores. 
Of course, with a long standardized test b'^ttery, it would not be 
practical or worth while to examine each test item with the 
students. 

The next step is to explain to the students the way in which 
the test score,s are presented for interpretation. Results may be 
reported as raw scores, scaled scores, percent ilcs, age-apii valent s, 
grade-ecpiivaleiits, or staninrs. The manual which accompanies 
each test e.\ plains the kinds of score.s which are available and 
how they arc derived. Each teacher should ha\e her own copy of 
the interpretation mamial for each test used in a testing program. 
Information as to how the score.s are derived and their meaning.s 
will be of interest for many of the .''Uidcnts, 

The cla.ss should 1)C told that ‘nationar’ percentile of a 
.standardized test indicates the i)crccutage of iiidividnals in the 
group of students used to establi.sh the test norms who made 
scores below that of the student. For example, a student at the 
75th iieicciitile scored better than 75 percent of the .students of 
the norming population. It does uof nican that the student missed 
only 25 percent of the test items. 

R ."'hould also be pointed out that, because of errors of rncasure- 
nicub one should think of the rcjiortcd percentiles as a haf^d, 
rather than a particular ifohit of, ,«ay, 75. That is, with a reported 
percentile of 75 the true .‘^core of the student might lie some- 
where between 70 and 80. P'urther, difTcrciiccs of percentiles be- 
tween two achievement tests should not be considered significant 
milc.ss the scores are scjui rated eiiougli so that these bands would 
not teml to overlap. For examiile. .supjiose that a student ranked 
at the 73<! percentile in mathematics and 77 th pc ice ii tile in Erig- 
ii-^h If the intorpretation manual indicates that there is an "error 
of measurement” of 5 percentile points at this part of the dis- 
tribution of scores, then the probability is 2 out of 8 that a band 
of .scores from 08 to 78 includes the mathematics score and a band 




INTERPRETATION OF TEST RESULTS 



49 



of scores from 72 to 82 contains the English score. Since the 
scores of 73 and 78 occur in both bands, it is quite possible that 
with a second administration of similar tests the percentiles could 
be reversed. In other words, for these tests at this part of the 
distribtition the difference in percentiles is not significant. 

An explanation .should be given to the students of the meaning 
of “norms/' One can explain, for example, that a sample con- 
sisting of a cross section of students of a grade and age similar 
to themselves was selected from all p«arts of the country, from 
industrial and agricultural areas, from large and small schools, 
and from prosperous suburban and crowded urban classrooms, 
that all of these /iidents were given the same tests, and that a 
percentile or .some other type of derived .score was computed and 
the.se results published a.s the ‘national norms," It should be 
pointed out that if tests from several different publi.shcrs are used 
as a part of a testing battery that the “national norms” were 
establi.shed on different samples of students which might account 
for certain small unexpected diffeience.s, 

“Local norms” are established on the basis of the same students 
within a smaller area — a State, county, city, or school. If liTcms 
are con.'^tructed foi* the .■^amc students for .several test.s at the same 
time, they will be compaiable. Also, these local norms compare 
the student with his pecr.s in his own comnuinity. 

One of the most meaningful ways to analyze the results of a 
te.st baitery is by means of a profile chart. This can be explained 
by constructing a sample profile on the blackboard, by preparing a 
I.arge chart ahead of time, by u.sing a flannel board on which 
the scores and line.s may be placed, or by using a ruled met«Tl 
board with magnetic spots to repic.sent the score.s and lines to 
join them. 

After these explanalion.s, each student should be vi\ert an un- 
marked profile sheet and a copy of his scores. Some test pub- 
lishers furnish such profile sficets with their tests. If net, such 
sheets can be easily divuvn and duplicated. 

The plotting of jirofiles for a clas.s would require much teacher 
time. Rather than distributing individually plotted profiles, it is 
quite acceptable, under proper .supervision, to let each student 
plot the marks corresjionding to his national percentile scores on 
(he proper line for each test and then Join the marks with a solid 
black line to form hi.s own profile. This solid line repre.^ents his 
positions relative to (he norms. It is then jiossible for 

the student to see the peaks and the valleys which indicate his 
own relative strengths and weaknesses, Those points on the pro- 



50 



INTERPRETATION OF TEST RESULTS 




file which are low will indicate the weak areas which may need 
further study while the high points will emphasize the apparent 
strengths and may suggest several areas for future specialized 
study. If the student is planning post-high-school education, it 
may be helpful to learn how he compares nationally with those 



APTITUDE 

Quanti- 

Verbol larfve Totol 



achievement 

Social Molhe- 

English Studies Science motfcs Totol 



9? 

97 

95 

90 

85 

80 

75 

70 

V 60 

I 50 

u 

i 40 

30 

25 

20 

15 

10 

5 

3 



62 


45 


30 


85 


15 


81 



107 



60 



45 







- 


- 











— / 
h 




h 

“ h 

— ft 

- h 


\ \ 

^ V 
^ \ 

\- 


"7 f 
i 1 
f-t 




J 




/ 

L 


— 






- 


- 




- 



Raw score 

Notionol 

percentile 

Local 

J percentile 



97 

95 

90 

85 

80 

75 

70 

60 
I 50 

u 

30 

25 

20 

IS 

10 

5 

3 



no 


55 


61 


48 




35 


20 


90 


95 


67 


25 


30 


96 


80 


55 



r ^ 


f ^ 


F 1 


F ^ 


- 


J 





- 




f \ ^ 




— 


r i 

i j 

if 
1 1 
1 1 

y i f 1 


r — \ — 

\ 

\ 

> 


\ \ 


- 


1 1 
n 

ti 

ft 




\ \ 
\ A 
\ ^ 
\ 

V 

-5 




il 

u 

f/ 

1 / 








^ 1 


f 












- 






- 


- 






- 



"Notional peicentiles 
local percentlfes 



Pigyrt $. Prolife Sheer for Som Smith of Potynko High School, Gretfe 10 



53 



interpretation of test results 51 

with whom he will be competing. Certainly, if he has such ambi- 
tions, he should be especially concerned in those areas in which he 
falls much below the median, or 50th percentile. 

If available, each student may be given his standing in terms of 
local percentile norms. These points can be plotted and connected 
by a broken line or drav/n in color. This line will show his rela- 
tive standing as compared with his own peers or those of the sur- 
rounding community. These results may be helpful if he plans to 
remain in the same geographical region and compete for jobs in 
the local labor market. Those students included in the local norms 
are the types of people with whom he probably will be competing. 

According to the student profile in figure 8, Sam Smith is a 
little better than average in overall aptitude. His total raw score 
of 107 places him at the the 60th percentile in terms of national 
norms, or in the top 40 percent. In terms of local percentiles he 
stands somewhere in the middle of the distribution of his class- 
mates. 

In verbal aptitude, nationally, Sam ranks in the bottom third of 
students of his own age a ;d grade. On local norms he is in the 
lowest fifth of his class. However, in quantitative aptitude he 
c.\ceeds more than four-fifths of his peers, for he scores at ap- 
proximately the 86th percentile on national norms and the 81st 
percentile o)i local norms. If one allows for an error of measure- 
ment of 5 percentile points, he is still in the top quarter both 
nationally and locally. When one combines the verbal and quantita- 
tive aptitude scores to obtain a total score, Sam appears to be a 
little above the average on the national norms and a little below 
average on the local norms. 

The strongest achievement area for Sam is nof English or 
social studies, He is in about the lowest third of the class in these 
subjects and may need extra help. Howaver, it is not too sur- 
prising that Sam is weak in these areas, since his verbal aptitude 
is low, and research shows a relationship between verbal ability 
and success in English and social studies. 

Sam’s greatest strength seems to be in the mathematics and 
science subjects. He is in the top 10 percent in science on both 
national and local norms. One may question why in mathematics 
he stands at the 95th percentile on the national norms and at the 
80th percentile on the local norm.s. Test scores cannot tell us the 
reasons, but they can point out areas which jieed further thought 
or investigation. One explanation might be that at Patunka High 
School many of the students are naturally good in mathematics. 
Another reason might be that an unusually dedicated mathematics 




.34 



52 



INTKRPRKTATIOX OF TEST RESULTS 



teacher has motivated the students to do much better than 
students of precedinfr years. All of those who equalled or exceeded 
Sam’s score would rank at the 95th percentile or better on the 
)wtioiwl norms. On local nornis about oiie-fifth of the students 
are better than Sam, so his local percentile drops a few points. 

The total achievenient score places Sam in the top third on na- 
tional norms and in the upper half on local norms. No total raw 
score is ^iveii in the boxhead for the achievement te.sts, since it 
would 1)0 meaniuKlcss because of the difTerent test lengths. The 
total achievement percentile.s were computed by methods outlined 
ill the test manual for the tests used. 

After the general class discussion by a trained teacher or by 
the counselor of the school, and after the students have plotted 
their own profiles, an opportunity should be given for any genei*al 
class questions concerning the meaning of the scores. At the con- 
clusioji of the discussion, the students should be encouraged to 
make appointments for iiulividual consultations concerning the 
test re.sults. By explaining to the class as a whole the general 
meaning of the,se test re.sults, many hour.s of individual explana- 
tioii.s and interpretations will be saved and the students them- 
selves will be better informed and better prepared for individual 
counseling. 



IX. Interpretation of Test Results to Parents 

Individual Conferences 

How much g-cncral information or how many details concerniiij^ 
the test results should be given to a parent during a conference 
about his child? As much information and as many details should 
be given to the parent as the principal, the counselor, or the 
teacher believes caji be understood and used properly. This does 
not mean that test results should be considered “top secret’" but 
that there is toothing gained by making available information 
which n^ay be mi.sinterpretcd. 

For example, as a general policy most schools will not indicate 
to a child or to his ]>arents the exact love’ of the child’s intelli- 
gence quotient (or )Q) because the concept of the IQ is mismi' 
der.stood by many people. It is even dithcult to get psychologists 
and educators to agree u])on a definition of intclligcjicc acceptable 
to adb Some parents believe that an IQ is a precise measure of 
their child’s ability, rather than an indicator of tlie apjnoximate 
range of values in which the diild’s IQ lies, and will praise or 
condemn on the basi.s of this .single number. 

Some iJooplc talk about a verbal irdclligcnce, a mathematical or 
<iuantitativc intelbgcncc, a mechanical or manipulative intclli- 
gonce, a spatial iutclligcncc, and so on. Studies which liavc beeri 
made in the jiast l>ave attempted to identify certain “factors” 
which together seem (o make up intelligence. Thc.se factors occur 
in varying amounts in dilfcrcnt individuals. Studios have indi- 
cated that success in certain occupation.s may be reasonably ex- 
pected when an individual posscs.scs ,<jiecified minimums of .<ome 
of the.se factors. Xo tc.sts have Ijcen developed to date which will 
measure acctirately and reliably the n'otivation or drive of a 
student, and it is quite possible that students who rank below 
statistically derived cutofT scores on some tests, say, in mathe- 
m«"ttics or English, might rio well in these same areas, Ilowewr. 
the odds are against such an accomplishment, and the student and 
luivent should be aware of these facts as thc\ make decisions con- 
cerning future education and occiijiational preiuiration. 

It j.s considered legitimate to indicate to the parent that his 
child seems to have unusual ability, as .shown by an intelligence 




^3 






54 INTERPRETATION OF TEST RESULTS 

test or a scholastic apt.tude test, and to suggest that his area of 
strength is in the verbal areas rather than quantitative areas or 
vice versa. On the other hand, it would be acceptable to indicate 
that a child who is in the lower quarter of the student popiila^:: ■ 
in ability seems to be doing as well in his work as can be expected. 

Similarly it can be pointed out to a parent that, if the child 
.seems to bo particularly gifted ami the results of achievement 
tests seem to indicate mediocre attainment, the child is not work- 
ing up to his expected capacity and he should be encouraged to 
use lUs ahilitie.s better. If the gifted child comes from a family 
that in the ]iast has not made learning opportunities available, 
perhaps the parents can be shown the possibilities of their child 
and encouraged to help him gain educational experiences and tiia> 
terials from .sources outside the school. 

1l is in:portant also to point out contradictions in the data or 
cojitlicts between test re.sults and the observations of teachers or 
coUiiselors. The child’s cumulative record may suggest that 
further individual testing may be needed. The parent should uii' 
der.staiid the reasons for additional te.sting. 

Most parents are eager to learn the level of ability of their 
child. They may have pertii^ent suggestions to offer as to why the 
child either is not doing as well as expected or better than ex- 
pected. Such vieu s should be incorporated in the student’s cumu- 
lati\’e )Tcord for future reference. ^ 

Sonieti.nies questions arise concerning a child who does not seem 
to be meeting even the nunimum standard for his grade, although 
his test scores sugge.st that he has the ability to be at the top of 
his class. If the Parent caunoi give reasonable explanations for 
this, then the situation requires further investigation by the 
counselor or other members of the pupil personnel staff. 

Group Conferences 

The ideal way to explain a school’s te.^ting program and to 
present a student’s test iosiiUr to his parejit.'^ j.s by an individual 
parent conference. However, with the current student-counselor 
ratio of 500 to 1, or gre; ter, in many of oiir schools, there are not 
enough hours in a day to carry out such a procedure. Most coun- 
.'^elor'5 do ijot have the time to make sure that each family inuler- 
>taiuis the philosophy of testing, the reason.^ certain te.sts were 
selected, and the meaning of the scores. 

When long individual conferences are not possible, the next best 
thing is to have a number of the parents meet, at which time they 
may be giwn general background information. Such gatherings 




INTERPRETATION OF TEST RESULTS 



55 



might be oi^c of the regular parent-teacher association meetings 
or a meeting called for this specif c purpose. In some schools it 
has been found convenient and help'ul to invite the parents of a 
single class for the discussion of tasis and the presentation of 
basic information. However, in a large elementary or high school, 
it would take maiiy weeks for trained personnel to reach all 
parents in this manner. Because of the interest in testing, it has 
been found that paients respond enthusiastically to the announce- 
ment of an opportunity to discuss different kinds of tests and how 
they may be used. 

The question arises as to the best procedure for a large group 
discussion of tests. Several approaches are possible. In one type 
of program, the first half of the .scheduled time may be devoted to 
a discussion of the current testing program by a counselor or 
t< ling spcdalist. A si mmary may be gi\^n of the complete test 
ing program of the school nr city. The purpose of each kind of 
tc>t at each grade level i.s prosnnted. Parent.^ are tnUl that tests 
may be administered in the fall fo help the teacher learn quickly 
the level of ability of each of his students. Further, some tests 
are administered in both the fall and the spring in a few grades 
or clas.'^es in order to compare the etfectiveness of new teaching 
methods. Other tests are administered r4t variou.s times in .some 
.senior higli schools to determine possible .scholaivship winners for 
colleges and universities. 

After this general discussion, the particular test.s which have 
just been completed by the students are described. Sample item.s 
may be shown, cither by distributing a mimeographed sheet con- 
taining the illustrative sample items which were used by the 
.students with the test.s, hy using .slides or an overhead projector, 
or by u.sing an opaque projector. It is helpful show the parents 
.some of the variety of forms in which the objective tc.st items 
occur and to emphasi/ie that one does )}ot tc.st for facts alone, as 
popular writers imi^ly, but that one can te.st fc . ba.sic skills and 
the ability to rca.sou as well. Many of the older parents were not 
“subjected’' to .such skillfully designed standardized tests as arc 
available today. When they were in school, city wide and statewide 
programs did not exist or were ju.sl being developed. 

The length of a test can be di.scusscd. Other things being equal, 
the longer a test, the more dependable the results. Many times a 
te.st requiring 45 minutes is preferable to another te.st with a 
similar title which requires only 10 minutes. The testing specialist 
.‘^ays that tl:e longer test is more “reliable,” i.e., if, within a few 
days, a student took the .same test or a parallel form, he would 



56 



IVTERPRKTATION OK TEST RESULTS 



receive appioxinuitely the same score. (A parallel form of a test 
is one which was conslmcled by the same author according to the 
same table of sj>ecIfications by using items from the original pool 
of qne^lions.) 

One can exi)lain the meaning of the various kinds of norms 
which have been used and briefly explain iji nontechnical terms 
how they were derived and what they mean. Sample profiles can 
be distributed or displayed on a .<crcon .so that all present can 
follow their interpi'otalion. A j)rofile similar to the one described 
in the preceding section would be helpful. 

After this presentation, it i.s j)ossible to open the meeting to 
questions. Tlie chairman may accept (iuo.stioiis from the floor. 
Another method is to distribute cards eaily in the program so 
that those pre.sent may write their (juestions. As .soon as the.se 
cards are collected, it is jmssible for the moderator to group re- 
lated or diq)ticating (juestions (juickly and to determine the difli- 
culties or lack of understamlings ammig those present. Specific 
(jue.slions can be read ami answered without embarrassment to 
ai\Y parent, 

Anotl e?' way to pi epare foi^ a parent meeting is to circulate 
to the parents via the stmlents a series of )>ossible qne.slion.s and 
topics oncerning testing" and the meaning of the results. The 
parents should be asked to check those que.^tious which are of 
most concern or of most interest to them and to reluni the forms 
at least 2 weeks before the schodtiled meeting. A (puck tally will 
indicate those topics which .<Iuudd be included in the program. 
Such a <iueslionnaire also heljjs the parent l<> think about the di.s- 
cussion tojdcs and to formulate questions foe the discu.s.sion 
I)oriod. 

In a Ihird Ivqje of njoeling, during which anonymous case 
slndic.s may be presented, tlie first i>art of the allotted time is de- 
voted to a short di.scn.ssion of the tests u.sed, Tlicn a distribution 
is made all parents of a lettcj-sized papei" folded over once and 
stapled. Ail perso!is are cautioned nof to remove the staple until 
told to do so, 

\\'ithout opening the .sheet, each person can read on the visible 
portioji of the paper all of the information available concerning 
one anonymous student. Information about the student would 
include his attendance record, course grades, extracurricular ac- 
tivities, interest niea.sures, lest .scoie.s, and general family and 
com in tin i ty backgro u ml . 

Kach per.son present would be asked lo think about the rela- 



INTERrRHTATION OF TEST RESULTS 57 

tionship of the test scores to all other information. These ques- 
tions could be raised : 

1. ‘SVhat would «ach parent ttdl this student on the basis of the in- 
foriiialton avai’able? 

2- fs ail of the i n fciiinat ion necessary? 

3. Would the test scores alone be sutficient to indicate student needs? 

4. Is the infonnatiori adequate to assist the student without the test 

stores? 

5. What is meant by national and local norms? {These could he ix- 

plained.) 

A few voluntcor.s from the audience might be willing to attempt 
an interpretation of the rej^ults and suggest j)roj)er action. After 
everyone has exhausted his ideas, the sheets may be opened so 
that everyone may see one of several ])Ossible piofe.ssional in- 
terpretations based upon the listetl fact.s. 

Programs of this case study type have provetl uorth while as 
a basis for group discussion. The inclusion of several contrasting 
cases can bo helpful. The use of a clitferent colored paj^er for each 
case will assure that all are talking about the same one. 

Timing i.s an important con.sideration in iscussions of tests 
with parents, Such meetings are most heljjful to the parents if 
they occur before or at the same time as the di.stribulion of test 
re.siilts to the studenl.s. Advance publicity through the local press 
announcing the early availability of test results and explanatory 
meetings can also prove helpful. 

There is some clifTercncD of opinion as to whether test results 
should be sent home to the parents. Ccilainly, no te.st results 
should be distributed by mail or taken home by the child unless 
accompanied by a .short clcsciiiition of the test and a simple ex- 
planation of the imMiiing of the results. l*his irrinciple is fre- 
quently ignoj'ed. Many times a child has brought home a pi<‘ce of 
paper with numbers, but with no indication as to whet he j* they 
ici)ic.scnt low scores or high scores, bad scores or vfood scores. 

The .story is told that when one parent was given a te.st report 
by his child he asked, “What’s this?” The child replied, “Testst’^ 
This was the total amount of explanation available to the pa<*ent! 

A cartoon ^ which appeared in 1955 illustratc.s one possible mis- 
interpretation of le.st results. A distraught mother wa.T shown 
calling (he doctor about her son who had just brought home a 

Hy C.airlni} ft. a in f .’ c |.>nXrt li*'>r. iiy Trir. Aki) 

in r. ft S' r.t.-. ft«. N.., r,4. lUxcml.Lr 'On Trl!ink« I'BJcnl!. AK-ut Ttvi Hr-u!l 5 .'* 

Janu» If. Iticl’i. Jj. Ni« V'jk; The 1V> ch f.-yir bI (.’li |-i BtiMn, 



GO 



58 INTERPRETATION OF TEST RESULTS 

card indicating that he had an IQ of 105. She wanted to know 
whether he should be put right to bed- Obviously, no information 
had been sent to explain the meaiung of this number to the 
parent. On the other hand, some schools send home with the child 
a 4-page brochure explaining the purposes of the tests, the mean- 
ing of the scores, and inviting the parent to make an appointment 
for a conference if further information is desired. 

Often the schools fail to take advantage of the local press as a 
means of informing the parents concerning a schoolwide test 
either impending of completed. A Florida county a few years ago 
gave much publicity to the importance of a certain testing pro- 
gram for all students. As a result, the attendance at school that 
day was the best of the year. No parent wanted his child to miss 
out on tests \vhich could help him. 

The press is willing to inform the public concerning the activi- 
ties of the school, including information about tests. If given the 
material and the proper assistance, newspapers will print a 
worthwhile discussion, including a description of the tests and the 
implications of test results. Such publicity can arouse the in- 
terest of the parents and encourage them to come to a parent- 
teacher meeting to learn more about tests. 



o 

ERIC 



G1 



X. Selected References 



Adams, Georgia S. and Torgerson, Theodore L, and Evalua- 

tion for the Secondary School Teacher uith Implications for Corrective 
Procedures. New York: The Dryden Press, 1956, 658 p. 

Ahmann, *J. Stanley and Clock, Marvin D. Evaluatinff Pupil Gron th. 
Boston: Allyn and Bacon, 1958. G05 p. 

and Wardeberg, Helen L. Evaluating Elcuientary School 

Pupils. Boston* Allyn and Bacon, Inc. 1960. 435 p. 

American* Council on Education, Committee on Measurement and 
Evaluation, College Testing, A Guide to Practices and Programs. 
Washington, D.C. : American Council on Education, 1959. 190 p. 

ANASTASr, Anne. Psychological Testing. 2d ed. New York: The Macmillan 
Co., 1961. 657 p. 

Anderson, Scarvza B.; Katz, Martin; and Shimberg, Benjamin, Meeting 
the Test. New York: Scholastic Book Services, 1963. 184 p. 

Bauernfeind, Robert H. Building a School Testing i*rograut. Boston: 
Houghton Mifflin Co., 1963. 343 p. 

Bean, Kenneth L. ConsIn/cfioH of Educational and Personnel Tests. New 
York: McGraw-Hill Book Co., Inc. 1953. 231 p. 

Berdie, Ralph F.; Layton, Wilbur L. ; Swanson, Edward 0.; and 
HagenaH, Theda. Counseling and the Use of Tests, A Manual for the 
State-Wide Testing Progratns of Minnesota. Minneapolis, Minn.: Uni- 
versity of Minnesota, 1959. 178 p. 

Testing in Guidance and Counseling, New 

York: McGraw-Hill Book Co., Inc., 1963. 288 p. 

Bradfield, James M. and Moredock, H. Stewart, Era/j^a* 

in Education. New York: The Macmillan Co., 1957. 509 p. 

Chauncev, Henry and Dobbin, Joh.n E. Testing: its Place in Edncatioti 
Today. 1st ed. New York: Harper & Row, Publishers, Inc., 1963. 224 p, 

College Entrance Examination Board. Manual of Freshman Class Profiles. 
Box 592, Princeton, N.J.: College Entrance Examination Board. 

Cronbach, Lee J. Essentials of Psychological Testing. 2d ed. New York: 
Harper & Bros., I960. 650 p. 

Dailey, John T. and Shaycoft, Marion F. Types of Tests in Project 
Talent. Washington: U.S. Governmerl Printing Office, 1961. 62 p. 
(U.S. Office of Education, Cooperative Research Monograph, No. 9 
OE~250U.) 

Davis, Frederick B. Educational Measurements and Their Inicrpretation. 
Belmont, Calif.: Wadswoith Publishing Co., 1- 



o 



69 



ERIC 




60 INTERPRETATION OF TEST RESULTS 

Item Selection Techniques. Educational Miasurcment. Washington: 

American Council on Education^ 1&5L p. 2G6-328, 

D-IEDERTCH, Paul B, Statistics for Tcachcr-madc Tests. Princeton, 

\,J.; Educational Testing Service, 1960. 44 p. {Evaluation and Ad- 
visory Service Series, No. 5J 

Doppelt, Jerome E. How Accurate Is a Test Score? Test Service Dnltetin, 
No, 50» New York: The Psychological Corporation, June 195'!, p. 1-3, 

and SEASiroRE^ Harold G, How Effective Are Your Tests? Te.st 

Seriicf Bulletin, No, 37, New York: The Psychological Corporation, 
June 1949. p. 4-10, 

IH'ROST, Waltf:r N. The Characleristica, Use, and Computation of Stanines. 
Test Service \^otcbookt No, 23, New York: Haicourt, Brace <S: World, 
Inc., 19G1. fi p. 

— flow To Tell Parents About Standardized Test Results. Test Service 

Notebook, No. 26. Tarry town, N,Y. : Harcourt, Brace A World, Inc., 
1961. 4p. 

Why Do We Test Your Children? Test NcriuVc Xoteboek, No. 17, New 

Y'ork: Harcourt, Brace & World, Inc., 1956. 4 p, 

Ma)t7iQl for /life rp ret Lip Metropolitan Acfticrrmcnt Tests. Primary 

I Through Advanced, New York; Harcourt, Brace & World. 1963, 

fl/id Prescott, George A, Essentials of J/casarcinn. t for Tfaffters. 

New York: Harcourt, Brace & World, Inc., 1962. 167 p. 

PlNOt.EV, Warren G., cd. The nnd of Seiiool Testing Pro- 

ijruin.^\ 62ri Vcaj'book. Part II, .>837 Kimhai'k Avctaic, Chicago; Na- 
Ijonal Soc7’ty for the Study of Education, 1063, 304 p. 

Freeman, Frank Samuel. T/icory and Prccficc of P.s}/ffinlopica/ Testing. 
3d ed. New York: Holt, Rinehart and Winston, Inc,, 1962, 697 p. 

FroehL'iCh, Clifford P, and Hoyt, Kenneth B, Guidance Testing ernrf 
Other Student Appicifsa/ Procedures for Teachers and Counselors. 3d 
cd. Chicago: Science Research Associates, Inc,, 1959. 43S p. 

Furst, Edward J. CnnsfrRcHjip EiuLiafion /i?sfrji?nrnfs. New York: Long- 
mans, Green and Co., 1958, 334 p, 

Gannon, F. B. and TelFsCJ^OW, Earl. Tests and hiferpretafions — a teacher’s 
handbook. Rochester, N.Y.; City School District, I960. 31 p, 

A Glossary of Measurement Terms, S'incty-sir Concepts U'A-Vft Constitute a 
Basic Vocabulary fn Evahtathn and Testing. iJel Monte Research Park, 
Monterey, Calif.; California Test Bureau, 1959, 16 p. 

Goidman, Leo. Using Tests in Counseling, New York: Apoleton-Century- 
Crofts, Irc., 1961. 434 p. 

Goodenouoh, Florence L. Mental Testing: Its History, Prinripfes, and Ap' 
plications. New York: Rinehart and Co., Inc., 1949. 609 p, 

GoslIN,. David A. The Search for Ability; Standardized Testing in Social 
Perspective. Volume 1 of a Series on the Social Consequences of Ability 
Testing, New' York: Russell Sage Foundation, 1963. 204 p. 




83 



IN'TERPKETATrOK OF TE?T RESULTS 



61 



Greeks', John A. Tcach<'y‘Mad€ Tests. New York: Harper & Row. Pul> 
Ushers, 1DG3. 141 p. 

Greene, Edward B. McaSHremnits of Hvinav fiehavior. Rev. cd. New York: 
The Odyssey Press, 1952. 790 p. 

Hart, IrExNE. Usin^ Stanines To Obtain Composite Scores Based on Test 
Data and Teachers’ Ranks. Test Service linlU tiu, No. 80. Tarrytown, 
N.Y.; Harcourt, Brace & World, Inc., 1957. 4 p. 

Humphreys, J. Anthony; Tbaxler, Arthur E.; and North, Robert D. 
GRjrfancc Services. 2d ed. Chicago: Science Research Associates, Inc., 
1900. 414 p. 

Jacobs, James N. Aptitude and Achievement Measures in Predicting High 
School Academic Success. The ^ersotitiel and Guidance Jouniai, 3:334- 
341, January 1959. Also n^jrinted as Test Service Gulletin^ No. 94. 
Tnrryto\;n, N.Y.: Harcourt, Brace & World, Inc, 0 p. 

Jordan, A. M. Mcasioe}nent in Education^ An Introduction. New York: 
McGraw-Hill Book Co., Inc,, 1953. 533 p. 

K.\tz, Martin K. Nt/refnji/ an Achieve. )unt Test; Eriuciples and Procedures. 
(Evaluation ami Advisory Service Series, No, 3) Princeton, N.J.; Edu- 
cational Testing Service, 1958. 32 p. 

Kent Area GUidano: Council. A Proi^oscd Tieelve~Year Tcsti}ig Progra)n. 
CoIu»nbus, Ohio: Ohio Scholarship Tests, State Department of Educa- 
tion, March 1959. 57 p. 

I.E.nno.n, Roger T, Testing in the Secondary School. Test Service Xotebook^ 
So. 20. Tarrytown, N.V.: Harcourt, Brace & World, Inc., 1957. 4 p. 

, ct at. A Glossary of ICO Measurement Terms. Test Service Note- 
book, No. 13, Tarrytown, N.V.: Harcourt, Brace & World, Inc. p. C. 

Lindquist, E. F., ed. Educational Miasurcmi nt. Washington, D.C.: Ameri- 
can Council on Education, 1951. 820 p. 

LIndvall, C. M. Testing and Evaluation: Ati Introduction. New' York: 
Harcourt, Brace & World, Inc., 19C1. 264 P. 

LVMAn, Howard B. Test Scores and U’/ial They Mean Englewood Cliffs, 
N.J.: Prentice-Hall, Inc., 1563. 223 p. 

McCabe, George E. Test Interpretation in the High School Guidance Pro- 
gram. Test Service fiuHctin, No. P.J. Tarrytown, N.Y.: Harcourt, Brace 
& World, Inc. p. 1-3. 

McLaughlin, Kenneth F. How Is a Test Built? i/nderstandinff Tcst> 
ing. Washington: U.S. Government Printing Office, 1962. p. 4-7. (U.S. 
Office of Education, OE-25003.) A/so reprm/rd as a Test Service Note- 
book. No. 25. Tarrytown, N.Y.; Harcourt, Braced World, Inc* 

— , ed. Understanding Testing. Washington: U.S. Government Printing 

Office, 1902. 24 p. (U.S. Office of Education, OE-25003.) 

Noll, Victor H. Introduction to Educational Measureme 7 d. Boston: Hough- 
ton Miffiin Co,, 1957. 437 p. 

Kemmers, H. H. and Gage, N. L. Educatioyial Mcasureuunt and Evalur Hon. 
Rev. cd. New York: Harper & Bros., 1955, 650 p. 



62 



INTERPF.ETATiON OF TEST RESULTS 



and Rummel, J. Francis, A Practical Introdvction to 

Measurement and Evaluation. New York: Harper & Bros., 1960. 370 p. 

Kicks, James H., Jr. On Telling Parents Abo Lit Test Results. Tet t Service 
Bulletin, A^d. 5i, December 1959. New YorL: The Psychological Corpora- 
tion. 4 p. 

Ross, C. C. and Stanley, Julian C. Measurirment in Today'$ Schools. 3d 
ed. New Yorkr Prentice-Hall, Inc., 1954. 48fi p. 

Rothnev, John W. M.; Danielson, Paul J.; and Hermann, Robert A. 
Measurement far Guidance. New York: Harper & Bros., 1959. ,'378 p. 

Russell, Roger W. and CroNBACh, Lee J. Report of Testimony at a Con- 
gressional Hearing [to the Senate Committee on Labor and Public Wel- 
fare on Feb. 27, 1958). The American Psychologist 13:219-220, March 
3 958. 

Schwartz, Awred and! Tiedlman, Stuart C Fvahiah'ni? Student Progress 
in the Secondary School. 1st ed. New York: Longmans, Gr^ijen and Co,. 
1957. 434 p. 

Seashore, Harold G. Methods of Expressing Test Scores. Test Scriicc 
Bulletin, No. i3. New York: The Psychological Corporation, January 
1955. p. 7-9, 

SeceL, David; Wellman, Frank E.; and HAMtLTON, Allen T, A>i Ap- 
proach to tndividfial Analysis in Ednca*ional and VocatioAol Guidance. 
Washington; D.S. Government Printing Office, 1958. 39 p* (U.S, Office 
of Education, BuVetin 1959, No. 1.) 

Stodola, Quentin. How One School System Records and Interprets Test 
Scores; A Do-It-Yourself Kit for Teachers, Teat Service Bulletin, No. 
89. Tarrytown, N.Y.: Harcourt, Brace World, Inc., 1958 6 p. 

. Making the Classroom Test; A Gm’dr for Teachers. Princeton, N.J.: 

Educational Testing Service, 1959. 28 p. (Evaluation and Advisory 
Service Series, No. 4) 

Testing Guide for Teachers. Prepared by the Technical Subcomivittee of 
the Independent Schools Advisory Committee. New York; Educational 
Records Bureau, 1961. 43 p. 

Thomas, R. Murray. Judging Student Pfogress. New York: Longmans, 
Green and Co., 3954. 421 p. 

Thorn DJKE, Robert L. and Hagen, Elizabeth. Measuremevt and Evalua- 
tion in Psychology and Education. 2d ed. New York: John W'iley & 
Sons Inc., 19G1. 002 p. 

‘1 RAVERS, Robert M. W. .iFrfucationai Measurement. New York: The Mac- 
millan Co., U’55. 420 p. 

Traxler, Arthur E. of Guidanee, Rev. ed. New York: Harper 

& Bros., 1957. 374 p. 

Jacobs, Robert; Selover. Margaret; and Townsend, Acatka. In- 
troduction to Testing and the Use of Test UesuUs in Public Schools. 
New York: Harper A* Bros., 1953. 113 p. 



INTERPRETATION OF TEST RESULTS 



63 



Triggs, Frances Oralind. Reading: Its Creathre Teaching and Test- 
ing: Kindergarten Through College. Mountain Home, N, C.: Frances 
Oralind Trigg'S, Chairman, The Committee on Diagnostic Reading Testsi 
Inc., 1960. 150 p. 

Tyler, Leona E. Tests and Measurements. Foundations of Modern Psy- 
chology Series. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1963. 116 p. 

Wandt, Edwin nurf Brown, Gerald W. Es'icntials of Educational Evalua- 
tion. New York; Henry Holl and Co., Inr., 1957. 117 p. 

Wesman, Alexander G. Aptitude, Intelligence, and Achievement. Test 
Service Bulletin, iVo, 51, December 1950. New York: The Psychological 
Corporation, p. 4-6. 

. Expectancy Tables — A Way of Interpreting Test Validity. Test 

Service Bulletin, No. S8. New York: The Psychological Corpors.tion, 
December 1949. p. 11-15. 

Willey, Clarence F. Simplified Item Analysis. Public Personnel Review 
14:24-25, January 1953. 

Womer, Frank B. Initiating i Testing Program. The Elementary School 
Journal 57:193-97, January 1957. Also reprinted as A Test Service 
BulUiin, So. 14, Boston; Houghton Mifflin Co., 3 p. 

Wriciit;sTONe, J. Wayne; Jv.st.man, Joseph; and Robbins, Irving. EvaUic- 
tion in Modern Education. New i’ork: American Book Co , 1956. 481 p. 



■;7U S GOVIRS ViEV PRINTI'.G OFflCE ■ I>i5-0“7JI -217 



